nips nips2011 nips2011-169 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
Reference: text
sentIndex sentText sentNum sentScore
1 at Abstract We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. [sent-7, score-0.964]
2 Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. [sent-8, score-0.577]
3 Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. [sent-9, score-0.847]
4 In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. [sent-10, score-0.862]
5 1 Introduction The recent development of conditional random fields (CRFs) [1], max-margin Markov networks (M3Ns) [2], and structured support vector machines (SSVMs) [3] has triggered a wave of interest in the prediction of complex outputs. [sent-12, score-0.512]
6 In this paper, we study multi-label structured prediction, defining the task and introducing the necessary notation in Section 2. [sent-15, score-0.265]
7 Once trained it allows the prediction of multiple structured outputs from a single input, as well as abstaining from a decision. [sent-17, score-0.561]
8 In Section 4 we discuss MLSP’s relation to existing methods for multi-label prediction with simple label sets, and to single-label structured prediction. [sent-22, score-0.581]
9 We furthermore compare MLSP to a multi-label structured prediction methods within the SSVM framework in Section 4. [sent-23, score-0.491]
10 1 2 Multi-label structured prediction We first recall some background and establish the notation necessary to discuss multi-label classification and structured prediction in a maximum margin framework. [sent-26, score-1.042]
11 Our overall task is predicting outputs y ∈ Y for inputs x ∈ X in a supervised learning setting. [sent-27, score-0.181]
12 In ordinary (single-label) multi-class prediction we use a prediction function, g : X → Y, for this, which we learn from i. [sent-28, score-0.545]
13 We measure the quality of predictions by a task-dependent loss function ∆ : Y × Y → R+ , where ∆(y, y ) specifies what cost occurs if we predict an output y while the ¯ ¯ correct prediction is y. [sent-38, score-0.414]
14 Structured output prediction can be seen as a generalization of the above setting, where one wants to make not only one, but several dependent decisions at the same time, for example, deciding for each pixel of an image to which out of several semantic classes it belongs. [sent-39, score-0.401]
15 Consequently, structured output prediction requires specialized techniques that avoid enumerating all possible outputs, and that can generalize between labels in the output set. [sent-44, score-0.778]
16 A popular technique for this task is the structured (output) support vector machine (SSVM) [3]. [sent-45, score-0.307]
17 a technique for identifying the currently most violated linear constraints, working set training, in particular cutting plane [4] or bundle methods [5] allow SSVM training to arbitrary precision in polynomial time. [sent-49, score-0.276]
18 Multi-label prediction is a generalization of single-label prediction that gives up the condition of a functional relation between inputs and outputs. [sent-50, score-0.579]
19 We say that v ∈ {±1}Y represents the subset Y ∈ P(Y) if vy = +1 for y ∈ Y and vy = −1 otherwise. [sent-57, score-0.244]
20 , we write either Y i or v i for a label set in the training data. [sent-60, score-0.188]
21 Note that multi-label prediction can also be interpreted as ordinary single-output prediction with P(Y) taking the place of the original output set Y. [sent-62, score-0.627]
22 Multi-label structured prediction combines the aspects of multi-label prediction and structured output sets: we are given a training set {(xi , Y i )}i=1,. [sent-65, score-1.162]
23 ,n ⊂ X × P(Y), where Y is a structured output set of potentially very large size, and we would like to learn a prediction function: G : X → P(Y) with the ability to generalize also in the output set. [sent-68, score-0.675]
24 In the following, we will take the view of structured prediction point of view, deriving expressions for predicting multiple structured outputs instead of single ones. [sent-69, score-0.852]
25 Alternatively, the same conclusions could be reached by interpreting the task as performing multi-label predicting with binary output vectors that are too large to store or enumerate explicitly, but that have an internal structure allowing generalization between the elements. [sent-70, score-0.202]
26 3 Maximum margin multi-label structured prediction In this section we propose a learning technique designed for multi-label structure prediction that we call MLSP. [sent-71, score-0.824]
27 It makes set-valued prediction by1 , G(x) := {y ∈ Y : f (x, y) > 0} for f (x, y) := w, ψ(x, y) . [sent-72, score-0.252]
28 (2) 1 More complex prediction rules exist in the multi-label literature, see, e. [sent-73, score-0.252]
29 We restrict ourselves to perlabel thresholding, because more advanced rules complicate the learning and prediction problem even further. [sent-76, score-0.252]
30 2 Note that the compatibility function, f (x, y), acts on individual inputs and outputs, as in single-label prediction (1), but the prediction step consists of collecting all outputs of positive scores instead of finding the outputs of maximal score. [sent-77, score-0.761]
31 By including a constant entry into the joint feature map ψ(x, y) we can model a bias term, thereby avoiding the need of a threshold during prediction (2). [sent-78, score-0.305]
32 Note that our setup differs from SSVMs training in this regard. [sent-80, score-0.145]
33 There, a bias term, or a constant entry of the feature map, would have no influence, because during training only pairwise differences of function values are considered, and during prediction a bias does not affect the argmax-decision in Equation (1). [sent-81, score-0.35]
34 As the risk depends on the loss function chosen, we first study the possibilities we have for the set loss ∆ML : P(Y) × P(Y) → R+ . [sent-83, score-0.187]
35 There are no established functions for this in the structured prediction setting, but it turns out that two canonical ¯ set losses are consistent with the following first principles. [sent-84, score-0.519]
36 In the special case of λ ≡ 1 the sum loss is known as ¯ ¯ symmetric difference loss, and it coincides with the Hamming loss of the binary indicator vector representation. [sent-89, score-0.196]
37 The max loss becomes the 0/1-loss between sets in this case. [sent-90, score-0.147]
38 In a general case, λ typically expresses partial correctness, generalizing the single-label structured loss ∆(y, y ). [sent-91, score-0.319]
39 While in the small-scale multi-label situation, the sum loss is more common, we argue in this work that that the max loss has advantages in the structured prediction situation. [sent-96, score-0.729]
40 1 Maximum margin multi-label structured prediction (MLSP) To learn the parameters w of the compatibility function f (x, y) we follow a regularized risk minimization framework: given i. [sent-105, score-0.636]
41 Using the definition of ∆max this is equivalent to minimizing 2 w n 1 2 i + C i ξ i , subject to ξ i ≥ λ(Y i , y) for all y ∈ Y with vy f (xi , y) ≤ 0. [sent-112, score-0.145]
42 3 Besides this slack rescaled variant, one can also form margin rescaled training using the constraints i ξ i ≥ λ(Y i , y) − vy f (xi , y), for all y ∈ Y. [sent-123, score-0.395]
43 The main difference between slack and margin rescaled training is how they treat the case of λ(Y i , y) = 0 for some y ∈ Y. [sent-125, score-0.215]
44 In slack rescaling, the corresponding outputs have no effect on the training at all, whereas for margin rescaling, no margin is enforced for such examples, but a penalization still occurs whenever f (xi , y) > 0 for y ∈ Y i , or if f (xi , y) < 0 for y ∈ Y i . [sent-126, score-0.359]
45 2 Generalization Properties Maximum margin structured learning has become successful not only because it provides a powerful framework for solving practical prediction problems, but also because it comes with certain theoretical guarantees, in particular generalization bounds. [sent-128, score-0.593]
46 L(Qw ,D) ≤ 1 n n (xi , Y i , f ) + i=1 s2 ||w||2 ln(rn/||w||2 ) + ln n ||w||2 σ + n 2(n − 1) 1/2 (9) i for (xi , Y i , f ) := max λ(Y i , y) vy f (xi , y) < 1 , where v i is the binary indicator vector of Y i . [sent-137, score-0.164]
47 This is the same complexity as for single-label prediction using SSVMs, despite the fact that multi-label prediction formally maps into P(Y), i. [sent-145, score-0.504]
48 For the margin-rescaled variant we obtain the dual max − αi ∈R+ y 1 2 i ı i ı ı vy v¯ αy α¯ k (xi , y), (x¯, y ) + ¯ y ¯ y ¯ (i,y),(¯,¯) ıy subject to y i αy ≤ i λi αy y (10) (i,y) C , n for i = 1, . [sent-155, score-0.187]
49 (11) For slack-rescaled MLSP, the dual is computed analogously as max − αi ∈R+ y 1 2 subject to ı i ı i ı vy v¯ αy α¯ k (xi , y), (x¯, y ) + ¯ y ¯ y ¯ (i,y),(¯,¯) ıy y i αy (12) (i,y) i αy C ≤ , λi n y for i = 1, . [sent-159, score-0.209]
50 The crucial step in working set training is the identification of violated constraints. [sent-177, score-0.211]
51 This allows us to reuse existing methods for loss augmented single label inference. [sent-179, score-0.17]
52 Identifying violated “negative” constraint requires loss-augmented prediction over Y \Y i . [sent-182, score-0.321]
53 Note that K-best versions of most standard MAP prediction methods have been developed, including max-flow [9], loopy BP [10], LP-relaxations [11], and Sampling [12]. [sent-184, score-0.273]
54 In contrast to single-label SSVM prediction this requires not only a maximization over all elements of Y, but the collection of all elements y ∈ Y of positive score. [sent-187, score-0.252]
55 Also, it is often possible to establish an upper bound on the number of desired outputs, and then, K-best prediction techniques can again be applied. [sent-192, score-0.278]
56 This makes MLSP of potential use for several classical tasks, such as parsing and chunking in natural language processing, secondary structured prediction in computational biology, or human pose estimation in computer vision. [sent-193, score-0.526]
57 In general situations, evaluating (2) might require approximate structured prediction techniques, e. [sent-194, score-0.491]
58 Note that the use of approximation algorithms is little problematic here, because, in contrast to training, the prediction step is not performed in an iterative manner, so errors do not accumulate. [sent-197, score-0.252]
59 2) Per-label decomposition [16] trains one classifier for each output label and makes independent decision for each of those. [sent-199, score-0.172]
60 Given the size of Y, 1) is not a promising direction for multi-label structured prediction. [sent-201, score-0.239]
61 However, MLSP resembles both approaches by sharing their prediction rule (2). [sent-203, score-0.274]
62 MLSP can be seen as a way to make a combination of approaches applicable to the situation of structured prediction by incorporating the ability to generalize in the label set. [sent-204, score-0.661]
63 They use a structured prediction framework to encode dependencies between the individual output labels, of which there are relatively few. [sent-209, score-0.573]
64 MLSP, on the other hand, aims at predicting multiple structured object, i. [sent-210, score-0.291]
65 the structured prediction framework is not just a tool to improve multi-class classification with multiple output labels, but it is required as a core component for predicting even a single output. [sent-212, score-0.625]
66 Some previous methods targeting multi-label prediction with large output sets, in particular using label compression [25] or a label hierarchy [26]. [sent-213, score-0.514]
67 This allows handling thousands of potential output classes, but a direct application to the structured prediction situation is not possible, because the methods still require explicit handling of the output label vectors, or cannot predict labels that were not part of the training set. [sent-214, score-0.933]
68 The actual task of predicting multiple structured outputs has so far not appeared explicitly in the literature. [sent-215, score-0.387]
69 The situation of multiple inputs during training has, however, received some attention: [27] introduces a one-class SVM based training technique for learning with ambiguous ground truth data. [sent-216, score-0.287]
70 The compatibility functions learned by the maximum-margin techniques [13, 27] have the same functional form as f (x, y) in MLSP, so they can, in principle, be used to predict multiple outputs using Equation (2). [sent-220, score-0.154]
71 However, our experiments of Section 5 show that this leads to low multilabel prediction accuracy, because the training setup is not designed for this evaluation procedure. [sent-221, score-0.505]
72 1 Structured Multilabel Prediction in the SSVM Framework At first sight, it appears unnecessary to go beyond the standard structured prediction framework at all in trying to predict subsets of Y. [sent-223, score-0.491]
73 As mentioned in Section 3, multi-label prediction into Y can be interpreted as single-label prediction into P(Y), so a straight-forward approach to multi-label structured prediction would be to use an ordinary SSVM with output set P(Y). [sent-224, score-1.118]
74 Unfortunately, as we will show in this section, the P-SSVM setup is not well suited to the structured prediction situation. [sent-227, score-0.538]
75 A P-SSVM learns a prediction function, G(x) := argmaxY ∈P(Y) F (x, Y ), with linearly parameterized compatibility function, F (x, Y ) := w, ψ(x, Y ) , by solving the optimization problem 1 w w∈H,ξ 1 ,. [sent-228, score-0.31]
76 This turns the argmax-evaluation for G(x) exactly into the prediction rule (2), and the constraint set in (15) simplifies to ξ i ≥ ∆ML (Y i , Y ) − y∈Y Yi i vy f (xi , y), for i = 1, . [sent-243, score-0.374]
77 Because w, and thereby f , are learned iteratively, they typically go through phases of low prediction quali i ity, i. [sent-250, score-0.252]
78 Consequently, we presume that P-SSVM training is intractable for structured prediction problems, except for the case of a small label set. [sent-257, score-0.679]
79 On the one hand, it is straight-forward to model as a structured prediction task, see e. [sent-265, score-0.491]
80 On the other hand, its output set is small enough such that we can compare MLSP also against other approaches that cannot handle very large output sets, in particular P-SSVM and independent per-label training. [sent-268, score-0.164]
81 As the label set is small, we use exhaustive search over Y to identify violated constraints during training and to perform the final predictions. [sent-285, score-0.287]
82 As there is no single established multi-label error measure, and because it illustrates the effect of training with different loss function, we report several common measures. [sent-287, score-0.206]
83 The results show nicely how the assumptions made during training influence the prediction characteristics. [sent-288, score-0.35]
84 Qualitatively, MLSP achieves best prediction accuracy in the max loss, P-SSVM is better if we judge by the sum loss. [sent-289, score-0.33]
85 This is also plausible, as SSVM training uses a ranking-like loss: all potential labels for each input are enforced to be in the right order (correct labels have higher score than incorrect ones), but nothing in the objective encourages a cut-off point at 0. [sent-292, score-0.225]
86 We take this as an indication that both, training with sum loss and training with max loss, make sense conceptually. [sent-298, score-0.354]
87 The problem in multi-label structured prediction is solely that |Y| is too large, and training data too scarce, to use either of these setups. [sent-301, score-0.589]
88 7 Figure 1: Multi-label structured prediction results. [sent-302, score-0.491]
89 Methods printed in italics are infeasible for general structured output sets. [sent-304, score-0.391]
90 the five methods, only MLSP, JKSE and SSVM generalize to more general structured prediction setting, as they do not require exhaustive enumeration of the label set. [sent-354, score-0.622]
91 2 Object class detection in natural images Object detection can be solved as a structured prediction problem where natural images are the inputs and coordinate tuples of bounding boxes are the outputs. [sent-357, score-0.664]
92 Following the experimental setup of [27] we use the multiscale part of the dataset for training and the singlescale part for testing. [sent-362, score-0.145]
93 For each method we train models on the training set and choose the C or ν value that maximizes the F1 score over the validation set of precropped object and background images. [sent-369, score-0.157]
94 SSVM as well as JKSE suffer particularly from low recall, and their predictions also have higher sum loss as well as max loss. [sent-373, score-0.158]
95 6 Summary and Discussion We have studied multi-label classification for structured output sets. [sent-374, score-0.321]
96 Existing multi-label techniques cannot directly be applied to this task because of the large size of the output set, and our analysis showed that formulating multi-label structured prediction set a set-valued structured support vector machine framework also leads to infeasible training problems. [sent-375, score-1.053]
97 Our experiments showed that MLSP has higher prediction accuracy than baseline methods that remain applied in structured output settings. [sent-377, score-0.573]
98 Besides these promising initial results, we believe that there are still several aspects of multi-label structured prediction that need to be better understood, in particular the prediction problem at test time. [sent-379, score-0.743]
99 Large margin methods for structured and interdependent output variables. [sent-402, score-0.381]
100 On structured output training: Hard cases and an efficient alternative. [sent-504, score-0.321]
wordName wordTfidf (topN-words)
[('mlsp', 0.64), ('ssvm', 0.445), ('prediction', 0.252), ('structured', 0.239), ('jkse', 0.142), ('vy', 0.122), ('multilabel', 0.108), ('training', 0.098), ('qw', 0.094), ('label', 0.09), ('yviol', 0.089), ('ml', 0.087), ('output', 0.082), ('loss', 0.08), ('mauc', 0.071), ('infeasible', 0.07), ('outputs', 0.07), ('violated', 0.069), ('margin', 0.06), ('object', 0.059), ('compatibility', 0.058), ('ssvms', 0.053), ('labels', 0.053), ('predicting', 0.052), ('setup', 0.047), ('classi', 0.047), ('detection', 0.045), ('working', 0.044), ('xi', 0.043), ('gw', 0.043), ('austria', 0.043), ('max', 0.042), ('generalization', 0.042), ('ordinary', 0.041), ('numeric', 0.038), ('situation', 0.037), ('sum', 0.036), ('chl', 0.036), ('secondary', 0.035), ('inputs', 0.033), ('lampert', 0.033), ('prec', 0.031), ('val', 0.031), ('feasibly', 0.031), ('tsoumakas', 0.031), ('constraints', 0.03), ('slack', 0.029), ('dembczynski', 0.029), ('fw', 0.029), ('hierarchical', 0.028), ('ecml', 0.028), ('map', 0.028), ('established', 0.028), ('besides', 0.028), ('rescaled', 0.028), ('risk', 0.027), ('argmaxy', 0.027), ('rec', 0.027), ('pkdd', 0.027), ('yue', 0.027), ('ew', 0.027), ('misclassi', 0.027), ('techniques', 0.026), ('collecting', 0.026), ('task', 0.026), ('blaschko', 0.026), ('vishwanathan', 0.026), ('miny', 0.026), ('images', 0.025), ('cation', 0.025), ('joint', 0.025), ('image', 0.025), ('sets', 0.025), ('enumerating', 0.024), ('finley', 0.024), ('subject', 0.023), ('applicable', 0.023), ('bundle', 0.023), ('pami', 0.023), ('equation', 0.022), ('sharing', 0.022), ('ranking', 0.022), ('monotonicity', 0.022), ('adopting', 0.022), ('analogously', 0.022), ('enumeration', 0.021), ('cheng', 0.021), ('loopy', 0.021), ('support', 0.021), ('precision', 0.021), ('technique', 0.021), ('discussing', 0.021), ('ected', 0.021), ('enforced', 0.021), ('penalization', 0.021), ('formulation', 0.021), ('consequently', 0.02), ('generalize', 0.02), ('biology', 0.02), ('exponentially', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
2 0.13452199 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
Author: Nico Goernitz, Christian Widmer, Georg Zeller, Andre Kahles, Gunnar Rätsch, Sören Sonnenburg
Abstract: We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output prediction often leads to difficult inference problems and hence requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit additional information from related learning tasks by means of hierarchical regularization. Training SO models on the combined set of examples from multiple tasks can easily become infeasible for real world applications. To be able to solve the optimization problems underlying multitask structured output learning, we propose an efficient algorithm based on bundle-methods. We demonstrate the performance of our approach in applications from the domain of computational biology addressing the key problem of gene finding. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach outperforms considered non-MTL methods. 1
3 0.095254697 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
4 0.092671275 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen
Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1
5 0.089468382 277 nips-2011-Submodular Multi-Label Learning
Author: James Petterson, Tibério S. Caetano
Abstract: In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F -score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
6 0.079276606 275 nips-2011-Structured Learning for Cell Tracking
7 0.075209133 193 nips-2011-Object Detection with Grammar Models
8 0.072170861 180 nips-2011-Multiple Instance Filtering
9 0.0697392 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
10 0.068013526 258 nips-2011-Sparse Bayesian Multi-Task Learning
11 0.067850783 33 nips-2011-An Exact Algorithm for F-Measure Maximization
12 0.061292984 168 nips-2011-Maximum Margin Multi-Instance Learning
13 0.061060928 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
14 0.058242697 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
15 0.057282519 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
16 0.057268292 165 nips-2011-Matrix Completion for Multi-label Image Classification
17 0.05613932 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
18 0.053686619 181 nips-2011-Multiple Instance Learning on Structured Data
19 0.053518724 28 nips-2011-Agnostic Selective Classification
20 0.053494215 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
topicId topicWeight
[(0, 0.179), (1, 0.05), (2, -0.087), (3, 0.044), (4, 0.033), (5, 0.018), (6, 0.0), (7, -0.056), (8, -0.068), (9, 0.03), (10, -0.015), (11, -0.017), (12, 0.01), (13, 0.041), (14, -0.058), (15, -0.01), (16, -0.014), (17, 0.016), (18, 0.062), (19, -0.026), (20, 0.032), (21, -0.021), (22, 0.044), (23, 0.039), (24, 0.022), (25, 0.028), (26, -0.1), (27, 0.006), (28, -0.042), (29, -0.057), (30, 0.072), (31, -0.042), (32, 0.082), (33, 0.063), (34, 0.014), (35, -0.083), (36, -0.054), (37, -0.064), (38, -0.002), (39, -0.06), (40, 0.027), (41, 0.046), (42, 0.07), (43, -0.02), (44, -0.093), (45, -0.006), (46, -0.018), (47, 0.024), (48, 0.077), (49, -0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.95102227 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
2 0.79372966 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
Author: Nico Goernitz, Christian Widmer, Georg Zeller, Andre Kahles, Gunnar Rätsch, Sören Sonnenburg
Abstract: We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output prediction often leads to difficult inference problems and hence requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit additional information from related learning tasks by means of hierarchical regularization. Training SO models on the combined set of examples from multiple tasks can easily become infeasible for real world applications. To be able to solve the optimization problems underlying multitask structured output learning, we propose an efficient algorithm based on bundle-methods. We demonstrate the performance of our approach in applications from the domain of computational biology addressing the key problem of gene finding. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach outperforms considered non-MTL methods. 1
3 0.74205911 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
4 0.72330385 277 nips-2011-Submodular Multi-Label Learning
Author: James Petterson, Tibério S. Caetano
Abstract: In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F -score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
5 0.68357199 28 nips-2011-Agnostic Selective Classification
Author: Yair Wiener, Ran El-Yaniv
Abstract: For a learning problem whose associated excess loss class is (β, B)-Bernstein, we show that it is theoretically possible to track the same classification performance of the best (unknown) hypothesis in our class, provided that we are free to abstain from prediction in some region of our choice. The (probabilistic) volume of this √ rejected region of the domain is shown to be diminishing at rate O(Bθ( 1/m)β ), where θ is Hanneke’s disagreement coefficient. The strategy achieving this performance has computational barriers because it requires empirical error minimization in an agnostic setting. Nevertheless, we heuristically approximate this strategy and develop a novel selective classification algorithm using constrained SVMs. We show empirically that the resulting algorithm consistently outperforms the traditional rejection mechanism based on distance from decision boundary. 1
6 0.63025397 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
7 0.62718064 275 nips-2011-Structured Learning for Cell Tracking
8 0.62018782 180 nips-2011-Multiple Instance Filtering
9 0.61723173 19 nips-2011-Active Classification based on Value of Classifier
10 0.6067096 33 nips-2011-An Exact Algorithm for F-Measure Maximization
11 0.6024797 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
12 0.58977336 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing
13 0.57948238 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
14 0.57189536 4 nips-2011-A Convergence Analysis of Log-Linear Training
15 0.56905031 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
16 0.56179971 193 nips-2011-Object Detection with Grammar Models
17 0.55527639 303 nips-2011-Video Annotation and Tracking with Active Learning
18 0.52544034 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
19 0.52025557 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
20 0.51779634 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
topicId topicWeight
[(0, 0.025), (4, 0.067), (13, 0.015), (20, 0.058), (26, 0.023), (31, 0.077), (33, 0.05), (43, 0.065), (45, 0.138), (57, 0.042), (65, 0.011), (71, 0.195), (74, 0.066), (83, 0.045), (99, 0.037)]
simIndex simValue paperId paperTitle
1 0.84633666 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search
Author: Joel Veness, Marc Lanctot, Michael Bowling
Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1
same-paper 2 0.79878342 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
3 0.78737241 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
4 0.76468581 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
5 0.7291953 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen
Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1
6 0.72215885 227 nips-2011-Pylon Model for Semantic Segmentation
7 0.71955645 154 nips-2011-Learning person-object interactions for action recognition in still images
8 0.71876097 231 nips-2011-Randomized Algorithms for Comparison-based Search
9 0.71498924 168 nips-2011-Maximum Margin Multi-Instance Learning
10 0.71444505 303 nips-2011-Video Annotation and Tracking with Active Learning
11 0.71418613 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
12 0.71321726 180 nips-2011-Multiple Instance Filtering
13 0.71147895 186 nips-2011-Noise Thresholds for Spectral Clustering
14 0.71029294 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
15 0.71028262 263 nips-2011-Sparse Manifold Clustering and Embedding
16 0.70932835 127 nips-2011-Image Parsing with Stochastic Scene Grammar
17 0.70860553 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
18 0.70830578 276 nips-2011-Structured sparse coding via lateral inhibition
19 0.70827568 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
20 0.70808625 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling