nips nips2007 nips2007-112 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. [sent-6, score-0.61]
2 In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. [sent-7, score-0.836]
3 However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. [sent-8, score-0.218]
4 The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. [sent-9, score-0.61]
5 A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. [sent-10, score-0.942]
6 The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. [sent-12, score-0.265]
7 An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. [sent-13, score-0.379]
8 The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. [sent-14, score-0.368]
9 This often takes the form of applying a monotonic transformation prior to using a classification algorithm. [sent-16, score-0.473]
10 For example, when the bag of words representation is used in document classification, it is common to take the square root of the term frequency [6, 5]. [sent-17, score-0.172]
11 Monotonic transforms are also used when classifying image histograms. [sent-18, score-0.191]
12 In [3], transformations of the form xa where 0 ≤ a ≤ 1 are demonstrated to improve performance. [sent-19, score-0.29]
13 When classifying genes from various microarray experiments it is common to take the logarithm of the gene expression ratio [2]. [sent-20, score-0.171]
14 Monotonic transformations can also capture crucial properties of the data such as threshold and saturation effects. [sent-21, score-0.243]
15 In this paper, we propose to simultaneously learn a hyperplane classifier and a monotonic transformation. [sent-22, score-0.5]
16 The solution produced by our algorithm is a piecewise linear monotonic function and a maximum margin hyperplane classifier similar to a support vector machine (SVM) [4]. [sent-23, score-0.774]
17 By allowing for a richer class of transforms learned at training time (as opposed to a rule of thumb applied during preprocessing), we improve classification accuracy. [sent-24, score-0.226]
18 The learned transform is specifically tuned to the classification task. [sent-25, score-0.138]
19 locally optimal solution to the problem, and a convex relaxation to find a globally optimal approximate solution. [sent-27, score-0.386]
20 In section 2, we present our formulation for learning a piecewise linear monotonic function and a hyperplane. [sent-29, score-0.585]
21 We show how to learn this combined model through an iterative coordinate ascent optimization using interleaved quadratic and linear programs to find a local minimum. [sent-30, score-0.198]
22 In section 3, we derive a convex relaxation based on Lasserre’s method [8]. [sent-31, score-0.337]
23 In section 4 synthetic experiments as well as document and image classification problems demonstrate the diverse utility of our method. [sent-32, score-0.148]
24 The transformation acts elementwise as can be seen in Figure 1. [sent-42, score-0.145]
25 We propose to learn both a maximum margin hyperplane and the unknown transform Φ(x) simultaneously. [sent-43, score-0.212]
26 In our formulation, Φ(x) is a piecewise linear function that we parameterize with a set of K knots {z1 , . [sent-44, score-0.364]
27 A more common method is to parameterize the function value Φ(z) at each knot z and apply order constraints between subsequent knots to enforce monotonicity. [sent-53, score-0.35]
28 Values in between knots are found through linear interpolation. [sent-54, score-0.208]
29 Using truncated ramp functions is preferable for numerous reasons. [sent-56, score-0.283]
30 The positivity constraints on the weights m will also yield a simpler formulation than order constraints and interpolation which becomes important in subsequent relaxation steps. [sent-59, score-0.409]
31 Figure 2a shows the truncated ramp function associated with knot z1 . [sent-60, score-0.37]
32 Figure 2b shows a conic combination of truncated ramps that builds a piecewise linear monotonic function. [sent-61, score-0.654]
33 Combining this with the support vector machine formulation leads us to the following learning problem: m1+m2+m3+m4+m5 1 0. [sent-62, score-0.119]
34 2 0 z1 m1 z1 z2 a) Truncated ramp function φ1 (x). [sent-66, score-0.174]
35 Figure 2: Building blocks for piecewise linear functions. [sent-68, score-0.166]
36 This problem is nonconvex due to the quadratic term involving w and m in the classification constraints. [sent-71, score-0.262]
37 This amounts to solving a support vector machine for w and b with a fixed Φ(x) and alternatively solving for Φ(x) as a linear program with the SVM solution fixed. [sent-74, score-0.142]
38 3 Convex Relaxation When faced with a nonconvex quadratic problem, an increasingly popular technique is to relax it into a convex one. [sent-81, score-0.391]
39 Lasserre [8] proposed a sequence of convex relaxations for these types of nonconvex quadratic programs. [sent-82, score-0.411]
40 The relaxation comes from dropping the rank one constraint on the outer product matrix. [sent-85, score-0.303]
41 However, we mainly use the first moment relaxation along with a few of the second order moment constraints that do not require any additional variables beyond the outer product matrix. [sent-87, score-0.309]
42 A convex relaxation could be derived directly from the primal formulation of our problem. [sent-88, score-0.377]
43 Both w and m would be relaxed as they interact in the nonconvex quadratic terms. [sent-89, score-0.3]
44 Un- fortunately, this yields a semidefinite constraint that scales with both the number of knots and the dimensionality of the data. [sent-90, score-0.173]
45 However, if we first find the dual formulation for w, b, and ξ, we only have to relax m which yields both a tighter relaxation and a less computationally intensive problem. [sent-92, score-0.388]
46 We introduce the relaxation via the substitution M = mmT and constraint M ¯¯ 0 where m is constructed by concatenating 1 with m. [sent-94, score-0.264]
47 This relaxation is exact if M is a rank one matrix. [sent-97, score-0.236]
48 When using this relaxation, we can recover the monotonic transform by using the first column (row) as the mixing weights, m, of the truncated ramp functions. [sent-104, score-0.774]
49 In practice, however, we use the learned kernel in our predictions k(x, x ) = i,j Mi,j φi (x)T φj (x ). [sent-105, score-0.11]
50 1 Experiments Synthetic Experiment In this experiment we will demonstrate our method’s ability to recover a monotonic transformation from data. [sent-107, score-0.552]
51 We then applied a strictly monotonic function to this sampled data. [sent-109, score-0.379]
52 The training set is made up of the transformed points and the original labels. [sent-110, score-0.17]
53 d-f) The transformation functions learned using the nonconvex algorithm. [sent-262, score-0.355]
54 g-i) The transformation functions learned using the convex algorithm. [sent-263, score-0.271]
55 if we could recover the inverse monotonic function, then a linear decision boundary would perform well. [sent-264, score-0.492]
56 Figure 3b shows the data and hyperplane transformed with a normalized logarithm. [sent-266, score-0.247]
57 200 were used for training, 200 for cross validation and 200 for testing. [sent-269, score-0.104]
58 We compared our locally optimal method (L mono), our convex relaxation (C mono) and a linear SVM (linear). [sent-270, score-0.449]
59 The linear SVM struggled on all of the transformed data while the other methods performed well as reported in Figure 4. [sent-271, score-0.189]
60 The learned transforms for L mono are plotted in Figure 3(d-f). [sent-272, score-0.647]
61 The learned functions for C mono are in Figure 3(g-i). [sent-275, score-0.541]
62 Both algorithms performed quite well on the task of classification and recover nearly the exact monotonic transform. [sent-276, score-0.429]
63 The local method outperformed the relaxation slightly because this was an easy problem with few local minima. [sent-277, score-0.292]
64 2 Document Classification In this experiment we used the four universities WebKB dataset. [sent-279, score-0.105]
65 The data is made up of web pages from four universities plus an additional larger set from miscellaneous universities. [sent-280, score-0.172]
66 Linear TFIDF Sqrt Poly RBF L Mono C Mono 1 vs 2 0. [sent-294, score-0.173]
67 In [6, 5] preprocessing the data with a square root was demonstrated to yield good results. [sent-347, score-0.15]
68 We will compare our nonconvex method (L mono), and our convex relaxation (C mono) to a linear SVM with and without the square root, with TFIDF features and also a kernelized SVM with both the polynomial kernel and the RBF kernel. [sent-348, score-0.665]
69 We will follow the setup of [6] by training on three universities and the miscellaneous university set and testing on web pages from the fourth university. [sent-349, score-0.274]
70 The convex relaxation chooses a step function nearly every time. [sent-353, score-0.337]
71 The nonconvex greedy algorithm does not end up recovering this solution as reliably and seems to get stuck in local minima. [sent-355, score-0.292]
72 In [3], it was shown that monotonic transforms of the form xa for 0 ≤ a ≤ 1 worked well. [sent-359, score-0.588]
73 Images were transformed into RGB histograms following the binning strategy of [3, 5]. [sent-362, score-0.126]
74 We ran a series of six pairwise experiments where the data was randomly split into 80 percent training, 10 percent cross validation, and 10 percent testing. [sent-363, score-0.399]
75 We compared our two methods to a linear support vector machine, as well as an SVM with RBF and polynomial kernels. [sent-365, score-0.142]
76 We also compared to the set of transforms xa for 0 ≤ a ≤ 1 where we cross validated over a = {0, . [sent-366, score-0.312]
77 This set includes linear a = 1 at one end, a binary threshold a = 0 at the other (choosing 00 = 0), and the square root transform in the middle. [sent-373, score-0.22]
78 The convex relaxation performed best or tied for best on 4 out 6 of the experiments and was the best overall as reported in Figure 6. [sent-374, score-0.337]
79 The nonconvex version also performed well but ended up with a lower accuracy than the cross validated family of xa transforms. [sent-375, score-0.391]
80 Cross validation over xa most often chose low nonzero a values. [sent-377, score-0.14]
81 Our method had many knots in these extremely low values because that was where the data support was. [sent-378, score-0.195]
82 Solid blue is the mean for the nonconvex algorithm and dashed blue is the standard deviation. [sent-380, score-0.263]
83 Linear Sqrt Poly RBF xa L Mono C Mono 1 vs 2 0. [sent-382, score-0.276]
84 5 −3 x 10 2 −3 x 10 Figure 7: The learned transformation functions for 6 Corel problems. [sent-474, score-0.17]
85 Each processed image is a 21 by 12 pixel 256 color gray scale image that is rastorized to form training vectors. [sent-478, score-0.19]
86 We randomly split the data into 80 percent training, 10 percent cross validation, and and 10 percent testing. [sent-480, score-0.355]
87 The learned monotonic function from L Mono and C Mono are similar to a sigmoid function which indicates that useful saturation and threshold effects where uncovered by our methods. [sent-482, score-0.511]
88 Figure 8a shows examples of training images before and after they have been transformed by our learned function. [sent-483, score-0.279]
89 Our learned transformation outperforms the linear SVM with the convex relaxation performing best. [sent-485, score-0.57]
90 5 Discussion A data driven framework was presented for jointly learning monotonic transformations of input data and a discriminative linear classifier. [sent-486, score-0.629]
91 The joint optimization improves classification accuracy and produces interesting transformations that otherwise would require a priori domain knowledge. [sent-487, score-0.187]
92 Subsequently, a semidefinite relaxation of the original problem was presented which does not suffer from local minima. [sent-490, score-0.264]
93 The greedy algorithm has similar scaling properties as a support vector machine yet has local minima to contend with. [sent-491, score-0.149]
94 The semidefinite relaxation is more computationally intensive yet ensures a reliable global solution. [sent-492, score-0.279]
95 Nevertheless, both implementations were helpful in synthetic and real experiments including text and image classification and improved over standard support vector machine tools. [sent-493, score-0.234]
96 0648 b) Figure 8: a) Original and transformed gender images. [sent-497, score-0.191]
97 These faster algorithms will help us explore extensions such as learning transformations across multiple tasks. [sent-500, score-0.218]
98 We also hope to explore applications to other domains such as gene expression data to refine the current logarithmic transforms necessary to compensate for well-known saturation effects in expression level measurements. [sent-501, score-0.311]
99 We are also interested in looking at fMRI and audio data where monotonic transformations are useful. [sent-502, score-0.566]
100 Support vector machine classification of microarray gene expression data, 1999. [sent-517, score-0.172]
wordName wordTfidf (topN-words)
[('mono', 0.465), ('monotonic', 0.379), ('relaxation', 0.236), ('transformations', 0.187), ('nonconvex', 0.185), ('ramp', 0.174), ('vs', 0.173), ('mj', 0.155), ('knots', 0.145), ('transformed', 0.126), ('hyperplane', 0.121), ('semide', 0.121), ('classi', 0.113), ('zj', 0.11), ('truncated', 0.109), ('transforms', 0.106), ('piecewise', 0.103), ('xa', 0.103), ('convex', 0.101), ('svm', 0.098), ('percent', 0.096), ('transformation', 0.094), ('knot', 0.087), ('corel', 0.086), ('quadratic', 0.077), ('learned', 0.076), ('lasserre', 0.076), ('universities', 0.076), ('cross', 0.067), ('jebara', 0.065), ('gender', 0.065), ('linear', 0.063), ('gene', 0.062), ('transform', 0.062), ('rbf', 0.059), ('miscellaneous', 0.058), ('image', 0.057), ('saturation', 0.056), ('preprocessing', 0.055), ('microarray', 0.053), ('parameterize', 0.053), ('elementwise', 0.051), ('sqrt', 0.051), ('tfidf', 0.051), ('support', 0.05), ('recover', 0.05), ('implementations', 0.05), ('locally', 0.049), ('root', 0.049), ('relaxations', 0.048), ('synthetic', 0.048), ('square', 0.046), ('convergent', 0.046), ('di', 0.045), ('six', 0.044), ('training', 0.044), ('document', 0.043), ('intensive', 0.043), ('greedy', 0.042), ('cation', 0.041), ('precomputed', 0.041), ('er', 0.041), ('dual', 0.041), ('formulation', 0.04), ('outer', 0.039), ('blue', 0.039), ('poly', 0.039), ('relaxed', 0.038), ('web', 0.038), ('zeros', 0.037), ('stuck', 0.037), ('validation', 0.037), ('validated', 0.036), ('categories', 0.036), ('award', 0.034), ('bag', 0.034), ('positivity', 0.034), ('constraints', 0.034), ('kernel', 0.034), ('images', 0.033), ('su', 0.032), ('processed', 0.032), ('columbia', 0.032), ('fold', 0.032), ('subsequent', 0.031), ('explore', 0.031), ('nite', 0.031), ('programs', 0.03), ('testing', 0.029), ('margin', 0.029), ('fourth', 0.029), ('vector', 0.029), ('experiment', 0.029), ('relax', 0.028), ('classifying', 0.028), ('labels', 0.028), ('kernels', 0.028), ('constraint', 0.028), ('expression', 0.028), ('local', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 112 nips-2007-Learning Monotonic Transformations for Classification
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
2 0.13235219 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
Author: Ronny Luss, Alexandre D'aspremont
Abstract: In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.
3 0.13018067 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
4 0.12334572 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr
Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random field (i.e. a random field defined using a finite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by defining a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.
5 0.1184589 63 nips-2007-Convex Relaxations of Latent Variable Training
Author: Yuhong Guo, Dale Schuurmans
Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1
6 0.09992066 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
7 0.089592673 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
8 0.08441902 62 nips-2007-Convex Learning with Invariances
9 0.075878844 118 nips-2007-Learning with Transformation Invariant Kernels
10 0.069009893 134 nips-2007-Multi-Task Learning via Conic Programming
11 0.068429336 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
12 0.068093516 160 nips-2007-Random Features for Large-Scale Kernel Machines
13 0.05978306 24 nips-2007-An Analysis of Inference with the Universum
14 0.057185635 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
15 0.054428622 174 nips-2007-Selecting Observations against Adversarial Objectives
16 0.054011125 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
17 0.053457454 51 nips-2007-Comparing Bayesian models for multisensory cue combination without mandatory integration
18 0.052602015 167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach
19 0.052568145 115 nips-2007-Learning the 2-D Topology of Images
20 0.051913999 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems
topicId topicWeight
[(0, -0.199), (1, 0.048), (2, -0.118), (3, 0.068), (4, -0.012), (5, -0.017), (6, 0.085), (7, 0.098), (8, -0.045), (9, 0.092), (10, 0.078), (11, -0.132), (12, -0.006), (13, 0.014), (14, 0.002), (15, 0.003), (16, -0.048), (17, -0.015), (18, 0.086), (19, -0.009), (20, -0.123), (21, 0.031), (22, -0.016), (23, 0.063), (24, -0.082), (25, -0.049), (26, -0.043), (27, 0.116), (28, 0.094), (29, 0.071), (30, -0.026), (31, -0.039), (32, 0.085), (33, 0.046), (34, -0.065), (35, -0.043), (36, -0.009), (37, -0.006), (38, -0.056), (39, -0.036), (40, -0.115), (41, -0.058), (42, -0.096), (43, -0.043), (44, -0.02), (45, 0.084), (46, 0.052), (47, 0.146), (48, 0.205), (49, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.94364512 112 nips-2007-Learning Monotonic Transformations for Classification
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
2 0.70905679 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
Author: Zenglin Xu, Rong Jin, Jianke Zhu, Irwin King, Michael Lyu
Abstract: We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2 ) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM. 1
3 0.64884973 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr
Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random field (i.e. a random field defined using a finite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by defining a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.
4 0.63687676 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
Author: Ronny Luss, Alexandre D'aspremont
Abstract: In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.
5 0.63593411 24 nips-2007-An Analysis of Inference with the Universum
Author: Olivier Chapelle, Alekh Agarwal, Fabian H. Sinz, Bernhard Schölkopf
Abstract: We study a pattern classification algorithm which has recently been proposed by Vapnik and coworkers. It builds on a new inductive principle which assumes that in addition to positive and negative data, a third class of data is available, termed the Universum. We assay the behavior of the algorithm by establishing links with Fisher discriminant analysis and oriented PCA, as well as with an SVM in a projected subspace (or, equivalently, with a data-dependent reduced kernel). We also provide experimental results. 1
6 0.60019356 63 nips-2007-Convex Relaxations of Latent Variable Training
7 0.51129669 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
8 0.50988156 62 nips-2007-Convex Learning with Invariances
9 0.48931128 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
10 0.47149253 174 nips-2007-Selecting Observations against Adversarial Objectives
11 0.46549144 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers
12 0.43353856 118 nips-2007-Learning with Transformation Invariant Kernels
13 0.4183937 134 nips-2007-Multi-Task Learning via Conic Programming
14 0.41651264 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems
15 0.40602437 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains
16 0.39426547 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
17 0.3828361 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning
18 0.37497419 160 nips-2007-Random Features for Large-Scale Kernel Machines
19 0.37400594 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
20 0.34180629 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
topicId topicWeight
[(5, 0.04), (13, 0.041), (16, 0.017), (19, 0.014), (21, 0.064), (31, 0.015), (34, 0.014), (35, 0.034), (47, 0.101), (49, 0.013), (83, 0.127), (85, 0.347), (87, 0.031), (90, 0.049)]
simIndex simValue paperId paperTitle
1 0.94838065 197 nips-2007-The Infinite Markov Model
Author: Daichi Mochihashi, Eiichiro Sumita
Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically infinite order. By extending a stick-breaking prior, which is usually defined on a unit interval, “vertically” to the trees of infinite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efficient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1
2 0.89818972 77 nips-2007-Efficient Inference for Distributions on Permutations
Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas
Abstract: Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact representations such as graphical models cannot efficiently capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent such distributions compactly. We present Kronecker conditioning, a general and efficient approach for maintaining these distributions directly in the Fourier domain. Low order Fourier-based approximations can lead to functions that do not correspond to valid distributions. To address this problem, we present an efficient quadratic program defined directly in the Fourier domain to project the approximation onto a relaxed form of the marginal polytope. We demonstrate the effectiveness of our approach on a real camera-based multi-people tracking setting. 1
3 0.88437259 135 nips-2007-Multi-task Gaussian Process Prediction
Author: Edwin V. Bonilla, Kian M. Chai, Christopher Williams
Abstract: In this paper we investigate multi-task learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on input-dependent features and a “free-form” covariance matrix over tasks. This allows for good flexibility when modelling inter-task dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noise-free observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of inter-task transfer occurs. We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1
same-paper 4 0.84382832 112 nips-2007-Learning Monotonic Transformations for Classification
Author: Andrew Howard, Tony Jebara
Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1
5 0.63608801 141 nips-2007-New Outer Bounds on the Marginal Polytope
Author: David Sontag, Tommi S. Jaakkola
Abstract: We give a new class of outer bounds on the marginal polytope, and propose a cutting-plane algorithm for efficiently optimizing over these constraints. When combined with a concave upper bound on the entropy, this gives a new variational inference algorithm for probabilistic inference in discrete Markov Random Fields (MRFs). Valid constraints on the marginal polytope are derived through a series of projections onto the cut polytope. As a result, we obtain tighter upper bounds on the log-partition function. We also show empirically that the approximations of the marginals are significantly more accurate when using the tighter outer bounds. Finally, we demonstrate the advantage of the new constraints for finding the MAP assignment in protein structure prediction. 1
6 0.59838146 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
7 0.58587772 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
8 0.5798564 63 nips-2007-Convex Relaxations of Latent Variable Training
9 0.57656151 134 nips-2007-Multi-Task Learning via Conic Programming
10 0.57608098 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
11 0.56965345 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems
12 0.56850117 97 nips-2007-Hidden Common Cause Relations in Relational Learning
13 0.56607133 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
14 0.56598592 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
15 0.5634588 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
16 0.56131154 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
17 0.55945605 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations
18 0.55802786 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
19 0.55693364 24 nips-2007-An Analysis of Inference with the Universum
20 0.55328768 128 nips-2007-Message Passing for Max-weight Independent Set