nips nips2010 nips2010-239 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Weiss, Benjamin Sapp, Ben Taskar
Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. [sent-3, score-0.247]
2 In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. [sent-4, score-0.465]
3 Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. [sent-6, score-0.294]
4 Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. [sent-7, score-0.972]
5 We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. [sent-8, score-0.779]
6 We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. [sent-9, score-0.423]
7 A primary example where intractable, large state-space models typically arise is in dynamic state estimation problems, including tracking articulated objects or multiple targets [1, 2]. [sent-11, score-0.261]
8 In this work, we propose a novel, principled framework called Structured Ensemble Cascades for handling state complexity while learning complex models, extending our previous work on structured cascades for low-treewidth models [4]. [sent-14, score-0.389]
9 The basic idea of structured cascades is to learn a sequence of coarse-to-fine models that are optimized to safely filter and refine the structured output state space, speeding up both learning and inference. [sent-15, score-0.584]
10 While we previously assumed (sparse) exact inference is possible throughout the cascade [4], in this work, we apply and extend the structured cascade framework to intractable hightreewidth models. [sent-16, score-1.037]
11 To avoid intractable inference, we decompose the desired model into an ensemble of tractable sub-models for each level of the cascade. [sent-17, score-0.591]
12 For example, in the problem of tracking articulated human pose, each sub-model includes temporal dependency for a single body joint only. [sent-18, score-0.282]
13 +")#/' Sum Sub-models θm (x, yj ) ≤ tm (x, α) Full Model + Ym Thresholding Y m+1 Refinement Inference *%+%,&! [sent-23, score-0.326]
14 "#$%&(&& 0'1*2',)34'' Sum θm+1 (x, yj ) ≤ tm+1 (x, α) Y m+1 Thresholding Y m+2 Refinement *%+%,&! [sent-25, score-0.297]
15 "#$%&)&& 0'+")#"' Figure 1: (a) Schematic overview of structured ensemble cascades. [sent-30, score-0.524]
16 The m’th level of the cascade takes as input a sparse set of states Y m for each variable yj . [sent-31, score-0.763]
17 The full model is decomposed into constituent sub-models (above, the three tree models used in the pose tracking experiment) and sparse inference is run. [sent-32, score-0.483]
18 Next, the max marginals of the sub-models are summed to produce a single max marginal for each variable assignment: θ (x, yj ) = p θp (x, yj ). [sent-33, score-0.81]
19 Note that each level and each constituent model will have different parameters as a result of the learning process. [sent-34, score-0.165]
20 In (b), we illustrate two consecutive levels of the ensemble cascade on real data, showing the filtered hypotheses left for a single video example. [sent-40, score-0.809]
21 To maintain efficiency, inference in the sub-models of the ensemble is uncoupled (unlike in dual decomposition [5]), but the decision to filter states depends on the sum of the max-marginals of the constituent models (see Figure 1). [sent-41, score-0.732]
22 We derive a convex loss function for joint estimation of submodels in each ensemble, which provably balances accuracy and efficiency, and we propose a simple stochastic subgradient algorithm for training. [sent-42, score-0.172]
23 First, we provide a principled and practical generalization of structured cascades to intractable models. [sent-44, score-0.438]
24 Third, we introduce a challenging VideoPose dataset, culled from TV videos, for evaluating pose estimation and tracking. [sent-46, score-0.201]
25 We find that our joint training of an ensemble method outperforms several competing baselines on this difficult tracking problem. [sent-48, score-0.535]
26 We further assume that f decomposes over a set of cliques C over inputs and outputs, so that θ(x, y) = c∈C θ fc (x, yc ). [sent-59, score-0.242]
27 Above, yc is an assignment 2 to the subset of Y variables in the clique c and we will use Yc to refer to the set of all assignments to the clique. [sent-60, score-0.335]
28 In tree-structured models we have used for for human pose estimation [6], typical K for each part includes image location and orientation and is on the order of 250, 000, so even K 2 in pairwise potentials is prohibitive. [sent-65, score-0.285]
29 Rather than learning a single monolithic model, a structured cascade is a coarse-to-fine sequence of increasingly complex models, where model complexity scales with Markov order in sequence models or spatial/angular resolution in pose models, for example. [sent-66, score-0.705]
30 The parameters of each model in the cascade are learned using a loss function which balances accuracy (not eliminating correct assignment) and efficiency (eliminating as many other assignments as possible). [sent-69, score-0.536]
31 More precisely, for each clique assignment yc , there is a max marginal θ (x, yc ), defined as the maximum score of any output y that contains the clique assignment yc : θ (x, yc ) max {θ(x, y ) : yc = yc }. [sent-70, score-1.459]
32 y ∈Y (1) For simplicitly, we will examine the case where the cliques that we filter are defined only over single variables: yc = yj (although the model may also contain larger cliques). [sent-71, score-0.539]
33 Clique assignments are filtered by discarding any yj for which θ (x, yj ) ≤ t(x) for a threshold t(x). [sent-72, score-0.684]
34 The threshold proposed in [4] is a “max mean-max” function, 1 t(x, α) = αθ (x) + (1 − α) θ (x, yj ). [sent-74, score-0.344]
35 (2) j=1 |Yj | j=1 yj ∈Yj Filtering max marginals in this fashion can be learned because of the “safe filtering” property: ensuring that θ(xi , y i ) > t(xi , α) is sufficient (although not necessary) to guarantee that no marginal consistent with the true answer y i will be filtered. [sent-75, score-0.448]
36 Structured Ensemble Cascades In this work, we tackle the problem of learning a structured cascade for problems in which inference is intractable, but in which the large node state-space has a natural hierarchy that can be exploited. [sent-82, score-0.58]
37 For example, such hierarchies arise in pose estimation by discretizing the articulation of joints at multiple resolutions, or in image segmentation due to the semantic relationship between class labels (e. [sent-83, score-0.301]
38 For each variable yj in the model, each level of the cascade filters a current set of possible states Yj , and any surviving states are passed forward to the next level of the cascade by substituting each state with its set of descendents in the hierarchy. [sent-87, score-1.32]
39 Thus, in the pose estimation problem, surviving states are subdivided into multiple finer-resolution states; in the image segmentation problem, broader object classes are split into their constituent classes for the next level. [sent-88, score-0.442]
40 3 We propose a novel method for learning structured cascades when inference is intractable due to loops in the graphical structure. [sent-89, score-0.522]
41 The key idea of our approach is to decompose the loopy model into a collection of equivalent tractable sub-models for which inference is tractable. [sent-90, score-0.346]
42 1 Decomposition without agreement constraints Given a loopy (intractable) graphical model, it is always possible to express the score of a given output θ(x, y) as the sum of P scores θp (x, y) under sub-models that collectively cover every edge in the loopy model: θ(x, y) = p θp (x, y). [sent-96, score-0.664]
43 ) For example, in the method of dual decomposition [5], it is possible to solve a relaxed MAP problem in the (intractable) full model by running inference in the (tractable) sub-models under the constraint that all sub-models agree on the argmax solution. [sent-98, score-0.27]
44 Enforcing this constraint requires iteratively re-weighting unary potentials of the sub-models and repeatedly re-running inference until each sub-model convergences to the same argmax solution. [sent-99, score-0.238]
45 However, for the purposes of a structured cascade, we are only interested in computing the max marginals θ (x, yj ). [sent-100, score-0.54]
46 In other words, we are only interested in knowing whether or not a configuration y consistent with yj that scores highly in each sub-model θp (x, y) exists. [sent-101, score-0.342]
47 We show in the remainder of this section that the requirement that a single y consistent with yj optimizes the score of each submodel (i. [sent-102, score-0.396]
48 Thus, because we do not have to enforce agreement between sub-models, we can learn a structured cascade for intractable models, but pay only a linear (factor of P ) increase in inference time over the tractable sub-models. [sent-104, score-0.892]
49 Formally, we define a single level of the ensemble cascade as a set of P models such that θ(x, y) = p θp (x, y). [sent-105, score-0.797]
50 We let θp (x, ·), θp (x, ·), θp (x) and tp (x, α) be the score, max marginal, max score, and threshold of the p’th model, respectively. [sent-106, score-0.204]
51 We define the argmax marginal or witness yp (x, yj ) to be the maximizing complete assignment of the corresponding max marginal θp (x, yj ). [sent-107, score-0.832]
52 Then, if y = yp (x, yj ) is the same for each of the p’th submodels, we have that θ (x, yj ) = θp (x, yj ) (4) p Note that if we do not require the sub-models to agree, then θ (x, yj ) is stricly less than p θp (x, yj ). [sent-108, score-1.513]
53 Nonetheless, as we show next, the approximation θ (x, yj ) ≈ p θp (x, yj ) is still useful and sufficient for filtering in a structured cascade. [sent-109, score-0.728]
54 2 Safe filtering and generalization error We first show that if a given label y has a high score in the full model, it must also have a large ensemble max marginal score, even if the sub-models do not agree on the argmax. [sent-111, score-0.587]
55 If p θp (x, y) > t, then p θp (x, yj ) > t for all yj ⊆ y. [sent-113, score-0.594]
56 In English, this lemma states that if the global score is above a given threshold, then the sum of sub-model max-marginals is also above threshold (with no agreement constraint). [sent-115, score-0.293]
57 For any yj consistent with y, we have θp (x, yj ) ≥ θp (x, y). [sent-117, score-0.594]
58 If the sub-models do not agree, and the truth is not above threshold, then the threshold may filter all of the states for a given variable 4 yj and therefore “break” the cascade. [sent-121, score-0.403]
59 However, we note that in our experiments, we never experienced such breakdown of the cascades due to overly aggressive filtering. [sent-123, score-0.209]
60 In order to learn parameters that are useful for filtering, Lemma 1 suggests a natural ensemble filtering loss, which we define for any fixed α as follows, Ljoint (θ, x, y ) = 1 p θp (x, y) ≤ tp (x, α) , (5) p where θ = {θ1 , . [sent-124, score-0.491]
61 ) To conclude this section, we provide a generalization bound on the ensemble filtering loss, equivalent to the bounds in [4] for the single-model cascades. [sent-129, score-0.39]
62 To do so, we first eliminate the dependence on x and θ by rewriting Ljoint in terms of the scores of every possible state assignment, θ · f (x, yj ), according to each sub-model. [sent-130, score-0.388]
63 Let ||θp ||2 ≤ F for all p, and ||f (x, yj )||2 ≤ 1 for all x and yj . [sent-134, score-0.594]
64 We rephrase the SC optimization problem (3) using the ensemble max-marginals to form the ensemble cascade learning problem, 1 λ ||θp ||2 + ξ i s. [sent-142, score-1.142]
65 p tp (x i , α) + i , (8) This update is identical to the original SC update with the exception that we update each model individually only when the ensemble has made a mistake jointly. [sent-149, score-0.519]
66 Thus, learning to filter with the ensemble requires only P times as many resources as learning to filter with any of the models individually. [sent-150, score-0.39]
67 4 Experiments We evaluated structured ensemble cascades in two experiments. [sent-151, score-0.761]
68 First, we analyzed the “best-case” filtering performance of the summed max-marginal approximation to the true marginals on a synthetic image segmentation task, assuming the true scoring function θ(x, y) is available for inference. [sent-152, score-0.212]
69 Second, we evaluated the real-world accuracy of our approach on a difficult, real-world human pose dataset (VideoPose). [sent-153, score-0.23]
70 In both experiments, the max-marginal ensemble outperforms state-of-the-art baselines. [sent-154, score-0.39]
71 5 = θ1 (x, y) + θ2 (x, y) + θ3 (x, y) θ4 (x, y) θ(x, y) + θ5 (x, y) + θ6 (x, y) (a) + (b) Figure 2: (a) Example decomposition of a 3 × 3 fully connected grid into all six constituent “comb” trees. [sent-155, score-0.213]
72 (b) Improvement over Loopy BP and constituent tree-models on the synthetic segmentation task. [sent-157, score-0.214]
73 1 Asymptotic Filtering Accuracy We first evaluated the filtering accuracy of the max-marginal ensemble on a synthetic 8-class segmentation task. [sent-160, score-0.541]
74 To generate a synthetic instance, we generated unary potentials ωi (k) uniformly on [0, 1] and pairwise potentials log-uniformly: ωij (k, k ) = exp −v, where v ∼ U[−25, 25] was sampled independently for every edge and every pair of classes. [sent-166, score-0.243]
75 ) We found that the ensemble outperformed Loopy BP and the individual sub-models by a significant margin for all K. [sent-175, score-0.39]
76 We found that sub-model agreement was significantly correlated (p < 10−15 ) with the error of the ensemble for all values of K, peaking at ρ = −0. [sent-181, score-0.518]
77 We then asked the question: Does the effect of model agreement explain the improvement of the ensemble over Loopy BP? [sent-184, score-0.518]
78 Thus, sub-model agreement does not explain the improvement over Loopy BP, indicating that submodel disagreement is not related to the difficulty in inference problems that causes Loopy BP to underperform relative to the ensembles (e. [sent-188, score-0.298]
79 All SC models outperform [9]; the “torso only” persistence cascade introduces additional error compared to a single-frame cascade, but adding arm dependencies in the ensemble yields the best performance. [sent-209, score-0.813]
80 (c): Summary of test set filtering efficiency and accuracy for the ensemble cascade. [sent-210, score-0.419]
81 2 The VideoPose Dataset Our dataset consists of 34 video clips of approximately 50 frames each. [sent-214, score-0.206]
82 In our experiments, we use the Buffy and half of the Friends clips as training (17 clips), and the remaining Friends and LOST clips for testing. [sent-217, score-0.274]
83 Each frame of each clip is hand-annotated with locations of joints of a full pose model: torso, upper/lower arms for both right and left, and top and bottom of head. [sent-221, score-0.291]
84 † For simplicity, we use only the torso and upper arm annotations in this work, as these have the strongest continuity across frames and strong geometric relationships. [sent-223, score-0.258]
85 All of the models we evaluated on this dataset share the same basic structure: a variable for each limb’s (x, y) location and angle rotation (torso, left arm, and right arm) with edges between torso and arms to model pose geometry. [sent-225, score-0.419]
86 For the VideoPose dataset, we augmented this model by adding edges between limb states in adjacent frames (Figure 1), forming an intractable, loopy model. [sent-227, score-0.388]
87 We learned a coarse-to-fine structured cascade with six levels for tracking as follows. [sent-231, score-0.632]
88 The six levels use increasingly finer state spaces for joint locations, discretized into bins of resolution 10 × 10 up to 80 × 80, with each stage doubling one of the state space dimensions in the refinement step. [sent-232, score-0.197]
89 For the ensemble cascade, we learned three sub-models simultaneously (Figure 1), with each sub-model accounting for temporal consistency for a different limb by adding edges connecting the same limb in consecutive frames. [sent-234, score-0.562]
90 We compared the single-frame cascade and the ensemble cascade to a state-of-the-art single-frame pose detector (Ferrari et al. [sent-237, score-1.287]
91 Points shown are the position of left/right shoulders and torsos at the last level of the ensemble SC (blue square, green dot, white circle resp. [sent-243, score-0.435]
92 Shown as dotted gray lines is the best guess pose returned by the [9]. [sent-246, score-0.173]
93 We found that the ensemble cascade was the most accurate for every joint in the model, that all cascades outperformed the state-of-the-art baseline, and, interestingly, that the single-frame cascade outperformed the torso-only cascade. [sent-249, score-1.362]
94 We suspect that the poor performance of the torso-only model may arise because propagating only torso states through time leads to an over-reliance on the relatively weak torso signal to determine the location of all the limbs. [sent-250, score-0.402]
95 Sample qualitative output from the ensemble is presented in Figure 4. [sent-251, score-0.42]
96 Tracking with articulated body parts is challenging for two main reasons. [sent-253, score-0.165]
97 [9] use loopy belief propagation to incorporate temporal consistency of parts, but to our knowledge we are the first to quantitatively evaluate on movie/TV show sequences. [sent-266, score-0.201]
98 In the method of dual decomposition [5], efficient optimization of a LP relaxation of MAP inference in an intractable model is achieved by coupling the inference of a collection of tractable sub-models. [sent-267, score-0.403]
99 ” We are currently investigating a priori properties of or assumptions about the data and cascade that will provably lead to efficient cascaded learning and inference. [sent-273, score-0.396]
100 Pictorial structures revisited: People detection and articulated pose estimation. [sent-389, score-0.282]
wordName wordTfidf (topN-words)
[('ensemble', 0.39), ('cascade', 0.362), ('yj', 0.297), ('cascades', 0.209), ('loopy', 0.201), ('videopose', 0.182), ('pose', 0.173), ('yc', 0.172), ('torso', 0.155), ('clips', 0.137), ('structured', 0.134), ('ltering', 0.129), ('agreement', 0.128), ('bp', 0.128), ('constituent', 0.12), ('articulated', 0.109), ('tracking', 0.106), ('tp', 0.101), ('friends', 0.1), ('intractable', 0.095), ('limb', 0.086), ('inference', 0.084), ('marginals', 0.081), ('lter', 0.078), ('safe', 0.075), ('cliques', 0.07), ('unary', 0.069), ('agree', 0.068), ('limbs', 0.068), ('ljoint', 0.068), ('filtering', 0.065), ('tractable', 0.061), ('arm', 0.061), ('clique', 0.061), ('states', 0.059), ('score', 0.059), ('assignment', 0.059), ('sc', 0.056), ('pictorial', 0.052), ('cvpr', 0.051), ('synthetic', 0.049), ('joints', 0.049), ('ferrari', 0.049), ('grid', 0.049), ('ltered', 0.049), ('th', 0.047), ('threshold', 0.047), ('ensembles', 0.046), ('potentials', 0.046), ('state', 0.046), ('buffy', 0.045), ('comb', 0.045), ('sidestepping', 0.045), ('simplicitly', 0.045), ('submodels', 0.045), ('surviving', 0.045), ('scores', 0.045), ('segmentation', 0.045), ('nement', 0.045), ('level', 0.045), ('decomposition', 0.044), ('eliminating', 0.043), ('sapp', 0.043), ('assignments', 0.043), ('frames', 0.042), ('marginal', 0.042), ('submodel', 0.04), ('argmax', 0.039), ('frame', 0.039), ('joint', 0.039), ('summed', 0.037), ('andriluka', 0.037), ('resolution', 0.036), ('dual', 0.035), ('cascaded', 0.034), ('articulation', 0.034), ('ciency', 0.033), ('pairwise', 0.033), ('displacement', 0.033), ('location', 0.033), ('safely', 0.031), ('roth', 0.031), ('loss', 0.03), ('levels', 0.03), ('occluded', 0.03), ('arms', 0.03), ('output', 0.03), ('unconstrained', 0.03), ('accuracy', 0.029), ('tm', 0.029), ('balances', 0.029), ('mistake', 0.028), ('pay', 0.028), ('yp', 0.028), ('max', 0.028), ('body', 0.028), ('weiss', 0.028), ('challenging', 0.028), ('evaluated', 0.028), ('video', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
Author: David Weiss, Benjamin Sapp, Ben Taskar
Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1
2 0.29392388 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
3 0.21120144 42 nips-2010-Boosting Classifier Cascades
Author: Nuno Vasconcelos, Mohammad J. Saberian
Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1
4 0.15603884 257 nips-2010-Structured Determinantal Point Processes
Author: Alex Kulesza, Ben Taskar
Abstract: We present a novel probabilistic model for distributions over sets of structures— for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured probabilistic models, like Markov random fields and context free grammars, with determinantal point processes, which arise in quantum physics as models of particles with repulsive interactions. We extend the determinantal point process model to handle an exponentially-sized set of particles (structures) via a natural factorization of the model into parts. We show how this factorization leads to tractable algorithms for exact inference, including computing marginals, computing conditional probabilities, and sampling. Our algorithms exploit a novel polynomially-sized dual representation of determinantal point processes, and use message passing over a special semiring to compute relevant quantities. We illustrate the advantages of the model on tracking and articulated pose estimation problems. 1
5 0.13662905 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
6 0.13304517 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
7 0.11792881 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
8 0.11663887 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
9 0.11246116 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
10 0.11156607 118 nips-2010-Implicit Differentiation by Perturbation
11 0.099909388 190 nips-2010-On the Convexity of Latent Social Network Inference
12 0.087411933 228 nips-2010-Reverse Multi-Label Learning
13 0.082152322 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
14 0.080458298 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
15 0.074227601 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
16 0.074065246 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
17 0.069795892 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.068740696 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
19 0.065167837 101 nips-2010-Gaussian sampling by local perturbations
20 0.060864687 149 nips-2010-Learning To Count Objects in Images
topicId topicWeight
[(0, 0.208), (1, 0.076), (2, -0.062), (3, -0.096), (4, -0.062), (5, -0.013), (6, -0.129), (7, 0.047), (8, 0.085), (9, 0.055), (10, -0.226), (11, -0.017), (12, 0.051), (13, 0.14), (14, -0.069), (15, 0.035), (16, 0.092), (17, 0.112), (18, -0.264), (19, 0.146), (20, 0.0), (21, -0.015), (22, -0.033), (23, -0.003), (24, -0.086), (25, -0.07), (26, -0.098), (27, -0.243), (28, -0.044), (29, -0.128), (30, 0.032), (31, -0.025), (32, -0.038), (33, 0.04), (34, -0.005), (35, -0.067), (36, -0.1), (37, -0.088), (38, 0.05), (39, 0.076), (40, -0.056), (41, 0.057), (42, -0.058), (43, -0.047), (44, 0.035), (45, 0.025), (46, -0.03), (47, -0.004), (48, 0.002), (49, -0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.93452221 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
Author: David Weiss, Benjamin Sapp, Ben Taskar
Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1
2 0.71790802 42 nips-2010-Boosting Classifier Cascades
Author: Nuno Vasconcelos, Mohammad J. Saberian
Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1
3 0.66625392 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
4 0.60673994 257 nips-2010-Structured Determinantal Point Processes
Author: Alex Kulesza, Ben Taskar
Abstract: We present a novel probabilistic model for distributions over sets of structures— for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured probabilistic models, like Markov random fields and context free grammars, with determinantal point processes, which arise in quantum physics as models of particles with repulsive interactions. We extend the determinantal point process model to handle an exponentially-sized set of particles (structures) via a natural factorization of the model into parts. We show how this factorization leads to tractable algorithms for exact inference, including computing marginals, computing conditional probabilities, and sampling. Our algorithms exploit a novel polynomially-sized dual representation of determinantal point processes, and use message passing over a special semiring to compute relevant quantities. We illustrate the advantages of the model on tracking and articulated pose estimation problems. 1
5 0.46963111 190 nips-2010-On the Convexity of Latent Social Network Inference
Author: Seth Myers, Jure Leskovec
Abstract: In many real-world scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We consider contagions propagating over the edges of an unobserved social network, where we only observe the times when nodes became infected, but not who infected them. Given such node infection times, we then identify the optimal network that best explains the observed data. We present a maximum likelihood approach based on convex programming with a l1 -like penalty term that encourages sparsity. Experiments on real and synthetic data reveal that our method near-perfectly recovers the underlying network structure as well as the parameters of the contagion propagation model. Moreover, our approach scales well as it can infer optimal networks of thousands of nodes in a matter of minutes.
6 0.44146183 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
7 0.43794107 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
8 0.43522567 118 nips-2010-Implicit Differentiation by Perturbation
9 0.42766705 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
10 0.40038431 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
11 0.38266137 2 nips-2010-A Bayesian Approach to Concept Drift
12 0.35734236 1 nips-2010-(RF)^2 -- Random Forest Random Field
13 0.34811884 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
14 0.33983403 283 nips-2010-Variational Inference over Combinatorial Spaces
15 0.33738264 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
16 0.33328962 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
17 0.32805997 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
18 0.32226661 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
19 0.32130989 234 nips-2010-Segmentation as Maximum-Weight Independent Set
20 0.31715524 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
topicId topicWeight
[(13, 0.055), (17, 0.027), (27, 0.073), (30, 0.067), (35, 0.029), (45, 0.227), (50, 0.106), (52, 0.032), (60, 0.024), (77, 0.076), (78, 0.022), (80, 0.153), (90, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.88434851 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
Author: David Weiss, Benjamin Sapp, Ben Taskar
Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1
2 0.86647809 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
3 0.86111379 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
Author: Dahua Lin, Eric Grimson, John W. Fisher
Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1
4 0.85614377 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
5 0.85409105 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
Author: Matthias Broecheler, Lise Getoor
Abstract: Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficiency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence. 1
6 0.85333925 117 nips-2010-Identifying graph-structured activation patterns in networks
7 0.85193521 158 nips-2010-Learning via Gaussian Herding
8 0.8495279 63 nips-2010-Distributed Dual Averaging In Networks
9 0.84910852 148 nips-2010-Learning Networks of Stochastic Differential Equations
10 0.84823656 155 nips-2010-Learning the context of a category
11 0.84664083 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
12 0.84601575 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
13 0.84480214 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
14 0.84364223 96 nips-2010-Fractionally Predictive Spiking Neurons
15 0.84360403 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
16 0.84332073 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
17 0.84223366 257 nips-2010-Structured Determinantal Point Processes
18 0.84117383 202 nips-2010-Parallelized Stochastic Gradient Descent
19 0.84116828 217 nips-2010-Probabilistic Multi-Task Feature Selection