cvpr cvpr2013 cvpr2013-186 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Kontschieder, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi
Abstract: Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both stateof-the-art forest models and the conventional pairwise CRF.
Reference: text
sentIndex sentText sentNum sentScore
1 Kohli University of Technology, Austria Abstract Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. [sent-3, score-0.706]
2 This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. [sent-4, score-0.232]
3 This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. [sent-6, score-0.921]
4 Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. [sent-7, score-0.598]
5 A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. [sent-9, score-0.505]
6 This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. [sent-10, score-0.539]
7 GeoF outperforms both stateof-the-art forest models and the conventional pairwise CRF. [sent-12, score-0.493]
8 In fact, conventional decision forests ignore the structure in output spaces and make predictions for each output variable independently. [sent-22, score-0.586]
9 This assumption prevents them from enforcing dependencies between variables, and for semantic segmentation tasks, translates into pixel labellings that do not follow object boundaries and are inconsistent with context. [sent-23, score-0.405]
10 In the forest approach in [13], spatial smoothness is achieved by combining structured class-labels that are learned by incorporating joint statistics in a small neighborhood. [sent-28, score-0.431]
11 Our framework overcomes the above-mentioned problem by incorporating learned spatial context directly within the forest itself. [sent-31, score-0.426]
12 Long-range corre- lations between pixel labels are captured via new soft connectivity features which can be computed efficiently using generalized geodesic distance transforms. [sent-33, score-0.663]
13 Another contribution is to analyse the relationship between a typical CRFlike energy and the forest training objective. [sent-34, score-0.505]
14 This analysis leads to a new objective for training decision forests that produces more accurate semantic segmentation. [sent-35, score-0.602]
15 Quantitative results demonstrate the superiority of our model both in terms of accuracy and efficiency, with respect to state-ofthe-art forest models and grid-based pairwise CRFs. [sent-37, score-0.435]
16 The recent work on autocontext [24, 26], stacking [18, 28], deep learning [14, 15] and entanglement [17] has shown how a sequence of classifiers using the output of the previous classifier as input to the next can both effectively capture spatial context (e. [sent-40, score-0.221]
17 In [9], the relationship between anytime classification and intermediate predictions within decision trees is shown. [sent-43, score-0.32]
18 Our geodesic forest model (GeoF) can be seen as a gen- eralization of semantic texton forests [24], auto-context [24, 26], and entanglement forests [17]. [sent-45, score-1.625]
19 In fact, GeoF builds upon these models by using: (i) new, long-range soft connectivity features, and (ii) a new field-inspired objective for forest training. [sent-46, score-0.647]
20 ΩW e⊂ ca Nst th→e seamnadnt aic 2 segmentation otans kis as nthotate do fp associating eaastch th pixel p with its corresponding discrete class label c ∈ C. [sent-50, score-0.194]
21 vWarei use cd etodenote predictions obtained at depth d in the tree. [sent-61, score-0.162]
22 (2) exten- seeks parameters θj which aim to maximize both the class purity and spatial compactnes of pixel clusters in child nodes. [sent-70, score-0.174]
23 Decision forests [3, 5, 8] further assume that the poste? [sent-75, score-0.313]
24 Typically, a decision tree is trained greedily, where for each split node j the parameters θj associated with a low energy (e. [sent-88, score-0.275]
25 Figure 1 illustrates this point and suggests that ideally we would like training to maximize class purity as well as encouraging spatial compactness of the resulting pixel clusters. [sent-91, score-0.221]
26 Coupling forest predictions to reveal hidden correla- tions. [sent-92, score-0.503]
27 In this paper, we overcome this problem and encourage forests 666666 to produce spatially compact/coherent pixel labellings. [sent-94, score-0.378]
28 In what follows, we will show how a learned model of spatial context can be encoded within a decision forest directly. [sent-95, score-0.534]
29 One of the key theoretical insights of our work is the observation that although forests make predictions for each variable independently, these predictions are related due to correlations at the feature level. [sent-97, score-0.527]
30 For instance, in the semantic image segmentation task consider the class predictions of two pixels p and q. [sent-98, score-0.334]
31 Therefore, output-variable dependencies can be encoded in the features that the forest operates on. [sent-102, score-0.46]
32 Long-range, soft connectivity features The need for long-range connectivity features. [sent-106, score-0.354]
33 In [16, 23, 27] the authors have shown how simple pixel comparison features can be effective in classification tasks when used within a decision forest. [sent-107, score-0.173]
34 Since the shortest path connecting them has a high geodesic length (it cuts through high image gradients, see definition in (4)), this provides a hint that the two points may not be part of the same object/class. [sent-115, score-0.456]
35 (a) Given a pixel pair (a reference and a probe pixel) popular features only look at the intensities at the two pixel positions, and ignore what happens in between. [sent-117, score-0.277]
36 (b) In contrast, the length of the shortest path connecting the pixel pair carries richer information. [sent-118, score-0.175]
37 The geodesic length of the shortest path connecting two points provides hints about the points belonging (or not) to the same object class (e. [sent-119, score-0.535]
38 They are based on the use of generalized geodesic distances, as introduced in [7] and summarized next. [sent-123, score-0.383]
39 Given a grey-valued image J, and a real-valued object “soft mask” (that encodes pixel likelihood) M(p) : Ω ∈ → [0, 1] the generalized geodesic ldihisotaondc)e M MQ( ips )d e:fi Ωne ∈d as fo→llow [0s,:1 Nd Q(p;M,∇J) = p m? [sent-125, score-0.448]
40 )) (4) with the geodesic distance between two points p and q: δ(p,q) =Γ∈inPfp,q? [sent-128, score-0.346]
41 Soft connectivity between a pixel and a class region. [sent-134, score-0.283]
42 They efficiently capture long-range connectivity (of a pixel to a class region). [sent-158, score-0.283]
43 We can use those probabilities to construct the soft masks M needed for the generalized geodesic distance transform, and the corresponding filtered probabilities will be g(c = torso) and g(c = left leg). [sent-160, score-0.561]
44 Contrast sensitivity is modulated by the geodesic strength parameter γ ≥ 0 in (5). [sent-162, score-0.346]
45 Entangled geodesic forests Here we are interested in extremely efficient semantic segmentation. [sent-166, score-0.757]
46 Thus, we build upon decision forests [3, 5, 8], because of their speed and flexibility. [sent-167, score-0.421]
47 4 in the spirit of entangled forests [17] we train all trees: (i) in parallel, (ii) in breadthfirst order, and (iii) in sections. [sent-172, score-0.672]
48 In fact, the class posteriors p(c|v) of the previous section may be tuhseed c as input freioatrusre ps( |tov )th oef tnheext p r[1ev7]io. [sent-177, score-0.168]
49 Given a class posterior psi (c|v) computed at the ith section (with i> 0), its geodesically )sm coomotphueted dve artsti ohne i is defined as gsi(c|v(p)) = W1psi(c|v(p)) e−Q(p;psi(c|σv2(Ω)),∇J)2 (6) Figure 4. [sent-184, score-0.181]
50 The trees are entangled because intermediate predictions of their top section are used (together with raw intensity features) as features for training of the lower sections. [sent-187, score-0.625]
51 Feature responses for a reference pixel r are defined as a function of tree depth d, and as sum, differences or absolute differences between two pixel probe values in different feature channels3, i. [sent-198, score-0.365]
52 the intermediate class posteriors computed in the( cs|e(cpti)o),n i s. [sent-204, score-0.201]
53 The entangled feature channels (k = 1, 2) are available only for section s1 and greater, and are computed very efficiently as table look-ups. [sent-210, score-0.359]
54 Field-inspired forest training objective This section describes our second contribution: the use of a new objective for the forest training procedure. [sent-213, score-0.958]
55 Most algorithms for training classification forests are greedy and find the optimal parameters for a split node j as θj = argminθ E(Sj , θ) (Fig. [sent-216, score-0.393]
56 ∈Cn(c,Sij) logn(|cS,jiS|ji) (7) with n(c, S) denoting the number of training pixels of class c i tnh hth ne( training nsuotbinsget tSh (please erre foefr t rtoa Fig. [sent-225, score-0.173]
57 f training each tree split node by using an MRF energy E = ERF, which is typically defined as ERF(Sj,θ) =i∈? [sent-229, score-0.214]
58 Thus, conventional entropy-based tree training corresponds exactly to minimizing an MRF-like energy which uses the log-loss as unary and no pairwise term4. [sent-238, score-0.323]
59 This is particularly important in the context of semantic segmentation, where often the pixels in the background class are much more numerous than those in other classes. [sent-248, score-0.207]
60 Results and Comparisons We validate our semantic segmentation approach on four, very diverse labelled image datasets. [sent-267, score-0.218]
61 We have the following 9 classes: background (BG), heart (HR), liver (LI), spleen (SP), left/right lung (LL/RL), left/right kidney (LK/RK) and aorta (AO). [sent-276, score-0.486]
62 In the latter, as energy model, we used a log-loss as unary term and a contrast-sensitive Potts model as pairwise term. [sent-289, score-0.146]
63 Additionally, we also implemented an auto-context [26] version of classification forests where: A first forest is trained using raw intensity features; Then, a second forest is trained using both raw intensities and the probabilities from the first forest as features. [sent-290, score-1.671]
64 Both entangled geodesic features and un-entangled class posteriors are 666999 Figure5. [sent-291, score-0.873]
65 Entangling the p feature channels only helps spatial Enabling the long-range geodesic feature channels g helps The spurious hand region is gone. [sent-302, score-0.346]
66 (f, i, l) Results from forest with geodesic entanglement and field-inspired energy term. [sent-303, score-0.926]
67 The combination of entangled geodesic features and log-loss training produces coherent segmentations without the need for field-based post-processing. [sent-314, score-0.752]
68 For all forest based algorithms we fix T = 10 and D = 20, except for the CamVid dataset where we use a maximum depth D = 17 since the number of training samples is considerably smaller. [sent-316, score-0.498]
69 However, decision forests are well-suited for GPU implementations [22]. [sent-318, score-0.421]
70 The baseline forest (0 1) yields a mean Jaccard score of only 38. [sent-320, score-0.396]
71 2%, still lower than what our implemented auto-context forest (0 3) and our proposed geodesic forests achieve (0 7-16). [sent-324, score-1.055]
72 Both the use of entangled geodesic features and the field-inspired energy help achieve the highest accuracy in this dataset. [sent-325, score-0.767]
73 Entangled geodesic forests using either of the two energy models (14 ,16) work better than the conventional forest (0 1). [sent-329, score-1.175]
74 Accuracy as a function of tree depth D, for different forest variants, evaluated on the LFW face dataset. [sent-332, score-0.523]
75 Our auto-context geodesic forest (0 8) does well, but the second forest does not seem to yield much additional improvement. [sent-334, score-1.138]
76 In terms of runtime, the standard forest + CRF (0 2) takes ∼ 0. [sent-335, score-0.396]
77 3% (0 2) we find again that providing entangled geodesic features improves on all our compared methods. [sent-345, score-0.705]
78 The autocontext forest performs well here too, even without these additional features. [sent-346, score-0.465]
79 However, the best results are achieved 777000 with one or two sections ofentanglement in geodesic forests (12 16). [sent-347, score-0.659]
80 2s per frame (w1h2i,le geodesic efo CrResFts a p(1p r2o) ancehed (0 ∼2 0 ta. [sent-349, score-0.346]
81 e best results are achieved by our auto-context geodesic forests (0 7, 0 8) which yield strong improvements over the baseline (+ 6. [sent-354, score-0.659]
82 0 3, 0 7 , 0 8) results in higher runtimes as two forests need to be evaluated (resulting in ∼ 1. [sent-359, score-0.313]
83 r3am9se/f rwamhilee) entangled geodesic fhor (e0st2s) are emsu c∼h 1fa. [sent-363, score-0.705]
84 3% which we are able to considerably outperform with all our geodesic forest variants. [sent-371, score-0.742]
85 The best performing geodesic forest (16) improves over the recent work in [13] (+2. [sent-372, score-0.742]
86 tried training forests by adding pairwise terms or other global smoothness terms in the energy (10), but without consistently improving the accuracy further. [sent-383, score-0.461]
87 7a we see that at depth 10 (after one level of entanglement) when the reference pixel is in the liver, the two probes tend to be selected (during training) to also be in the liver. [sent-391, score-0.23]
88 7b the probes tend to be selected frequently also in the heart and right lung regions. [sent-395, score-0.174]
89 Conclusion This paper has presented a new forest-based model for structured-output learning, applied to the task of semantic Class of reference pixels = liver (LI) Depht 10 Depht 13 Depht 17 (a)ABRLSHGOKLPR IBGHRLISP RLK AO0 0. [sent-402, score-0.247]
90 8765432 Class of reference pixels = left kidney (LK) Figure 7. [sent-413, score-0.185]
91 In this dataset (CT) classes are: background (BG), heart (HR), liver (LI), spleen (SP), l. [sent-418, score-0.229]
92 in b’ when trying to identify the left kidney it helps to use probes either in the spleen region (just above the left kidney) or in the left kidney itself (encouraging local smoothness). [sent-426, score-0.423]
93 Our model encourages spatial smoothness and long-range, semantic context within the forest itself, via the use of new, soft connectivity features which build upon entangled, generalized geodesic distances. [sent-428, score-1.122]
94 In addition, the paper shows how training forests by minimizing a new random field-inspired energy yields higher accuracy than entropy based approaches. [sent-429, score-0.422]
95 27 6 for our geodesic forest algorithm as compared to existing techniques (e. [sent-490, score-0.742]
96 random classification forest, and forest + CRF), for four different labelled image databases. [sent-492, score-0.466]
97 As time goes by anytime semantic segmentation with iterative context forests. [sent-498, score-0.211]
98 Structured class-labels in random forests for semantic image labelling. [sent-528, score-0.411]
99 Entangled decision forests and their application for semantic segmentation of CT images. [sent-561, score-0.569]
100 Hough forest random field for object recognition and segmentation. [sent-586, score-0.396]
wordName wordTfidf (topN-words)
[('forest', 0.396), ('entangled', 0.359), ('geodesic', 0.346), ('forests', 0.313), ('geof', 0.207), ('kidney', 0.143), ('connectivity', 0.139), ('fkd', 0.138), ('entanglement', 0.122), ('decision', 0.108), ('liver', 0.107), ('predictions', 0.107), ('semantic', 0.098), ('posteriors', 0.089), ('cp', 0.084), ('class', 0.079), ('soft', 0.076), ('sji', 0.076), ('tree', 0.072), ('zk', 0.072), ('labelled', 0.07), ('sj', 0.07), ('autocontext', 0.069), ('depht', 0.069), ('geodesically', 0.069), ('rlk', 0.069), ('spleen', 0.069), ('crf', 0.068), ('probes', 0.068), ('probe', 0.066), ('erf', 0.065), ('pixel', 0.065), ('dependencies', 0.064), ('energy', 0.062), ('aorta', 0.061), ('labellings', 0.059), ('conventional', 0.058), ('vid', 0.057), ('depth', 0.055), ('criminisi', 0.054), ('shotton', 0.054), ('lung', 0.053), ('heart', 0.053), ('ct', 0.052), ('kontschieder', 0.051), ('camvid', 0.051), ('probabilities', 0.051), ('segmentation', 0.05), ('eit', 0.049), ('bands', 0.047), ('unaries', 0.047), ('training', 0.047), ('gsi', 0.046), ('kinbg', 0.046), ('logwczn', 0.046), ('labelling', 0.045), ('unary', 0.045), ('jaccard', 0.044), ('reference', 0.042), ('springer', 0.041), ('lfw', 0.041), ('logn', 0.041), ('shortest', 0.041), ('raw', 0.04), ('intensities', 0.039), ('trees', 0.039), ('pairwise', 0.039), ('translates', 0.038), ('bul', 0.038), ('eyebrow', 0.038), ('lungs', 0.038), ('medical', 0.038), ('texton', 0.037), ('connecting', 0.037), ('leg', 0.037), ('nowozin', 0.037), ('generalized', 0.037), ('objective', 0.036), ('rota', 0.036), ('structured', 0.035), ('variables', 0.034), ('cq', 0.034), ('intermediate', 0.033), ('ji', 0.033), ('torso', 0.033), ('node', 0.033), ('maxp', 0.033), ('anytime', 0.033), ('psi', 0.033), ('path', 0.032), ('jancsary', 0.031), ('prevents', 0.031), ('kohli', 0.031), ('sharp', 0.031), ('purity', 0.03), ('munoz', 0.03), ('smoothing', 0.03), ('context', 0.03), ('kinect', 0.03), ('ii', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
Author: Peter Kontschieder, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi
Abstract: Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both stateof-the-art forest models and the conventional pairwise CRF.
2 0.36465213 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon
Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.
3 0.18273744 232 cvpr-2013-Joint Geodesic Upsampling of Depth Images
Author: Ming-Yu Liu, Oncel Tuzel, Yuichi Taguchi
Abstract: We propose an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image. Specifically, it computes depth for each pixel in the high resolution image using geodesic paths to the pixels whose depths are known from the low resolution one. Though this is closely related to the all-pairshortest-path problem which has O(n2 log n) complexity, we develop a novel approximation algorithm whose complexity grows linearly with the image size and achieve realtime performance. We compare our algorithm with the state of the art on the benchmark dataset and show that our approach provides more accurate depth upsampling with fewer artifacts. In addition, we show that the proposed algorithm is well suited for upsampling depth images using binary edge maps, an important sensor fusion application.
4 0.18180275 39 cvpr-2013-Alternating Decision Forests
Author: Samuel Schulter, Paul Wohlhart, Christian Leistner, Amir Saffari, Peter M. Roth, Horst Bischof
Abstract: This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to keep the method as flexible and general as possible, we adopt the principle of employing gradient descent in function space, which allows to minimize arbitrary losses. Contrary to Boosted Trees, in our method the loss minimization is an inherent part of the tree growing process, thus allowing to keep the benefits ofcommon Random Forests, such as, parallel processing. We derive the new classifier and give a discussion and evaluation on standard machine learning data sets. Furthermore, we show how ADFs can be easily integrated into an object detection application. Compared to both, standard Random Forests and Boosted Trees, ADFs give better performance in our experiments, while yielding more compact models in terms of tree depth.
5 0.15356521 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
7 0.12367433 406 cvpr-2013-Spatial Inference Machines
8 0.11875032 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification
9 0.11031045 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
10 0.10912701 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
11 0.10911895 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
12 0.10554844 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
13 0.10552908 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths
14 0.10346145 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
15 0.10168421 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
16 0.098494992 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
17 0.093666211 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
18 0.093265839 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
19 0.092082404 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
20 0.08968243 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
topicId topicWeight
[(0, 0.202), (1, 0.029), (2, 0.022), (3, -0.007), (4, 0.077), (5, 0.019), (6, 0.023), (7, 0.102), (8, -0.04), (9, -0.068), (10, 0.028), (11, 0.042), (12, -0.047), (13, 0.121), (14, -0.043), (15, 0.032), (16, -0.089), (17, -0.035), (18, 0.048), (19, -0.082), (20, 0.009), (21, 0.009), (22, -0.07), (23, 0.032), (24, -0.071), (25, 0.08), (26, -0.008), (27, 0.087), (28, 0.02), (29, 0.033), (30, -0.197), (31, 0.004), (32, -0.065), (33, -0.059), (34, 0.0), (35, -0.022), (36, -0.062), (37, -0.126), (38, 0.002), (39, 0.223), (40, -0.061), (41, 0.068), (42, -0.105), (43, 0.011), (44, 0.1), (45, 0.099), (46, -0.062), (47, 0.111), (48, 0.06), (49, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.94311792 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
Author: Peter Kontschieder, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi
Abstract: Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both stateof-the-art forest models and the conventional pairwise CRF.
2 0.82976258 39 cvpr-2013-Alternating Decision Forests
Author: Samuel Schulter, Paul Wohlhart, Christian Leistner, Amir Saffari, Peter M. Roth, Horst Bischof
Abstract: This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to keep the method as flexible and general as possible, we adopt the principle of employing gradient descent in function space, which allows to minimize arbitrary losses. Contrary to Boosted Trees, in our method the loss minimization is an inherent part of the tree growing process, thus allowing to keep the benefits ofcommon Random Forests, such as, parallel processing. We derive the new classifier and give a discussion and evaluation on standard machine learning data sets. Furthermore, we show how ADFs can be easily integrated into an object detection application. Compared to both, standard Random Forests and Boosted Trees, ADFs give better performance in our experiments, while yielding more compact models in terms of tree depth.
3 0.68919492 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
Author: Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, Andrew Fitzgibbon
Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel’s correspondence to 3D points in the scene ’s world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.
4 0.65339905 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
Author: Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, Ce Liu
Abstract: Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper, we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
5 0.59797531 406 cvpr-2013-Spatial Inference Machines
Author: Roman Shapovalov, Dmitry Vetrov, Pushmeet Kohli
Abstract: This paper addresses the problem of semantic segmentation of 3D point clouds. We extend the inference machines framework of Ross et al. by adding spatial factors that model mid-range and long-range dependencies inherent in the data. The new model is able to account for semantic spatial context. During training, our method automatically isolates and retains factors modelling spatial dependencies between variables that are relevant for achieving higher prediction accuracy. We evaluate the proposed method by using it to predict 1 7-category semantic segmentations on sets of stitched Kinect scans. Experimental results show that the spatial dependencies learned by our method significantly improve the accuracy of segmentation. They also show that our method outperforms the existing segmentation technique of Koppula et al.
6 0.54849988 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
7 0.53232807 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
8 0.51780999 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification
9 0.51778769 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
10 0.48113629 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
11 0.4650785 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
12 0.4607273 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
13 0.45330128 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
14 0.45014393 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
15 0.44777095 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
16 0.44074747 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
17 0.43941054 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
18 0.43873999 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
19 0.43839145 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
20 0.43200919 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
topicId topicWeight
[(10, 0.528), (16, 0.018), (26, 0.035), (33, 0.195), (67, 0.054), (69, 0.024), (87, 0.057)]
simIndex simValue paperId paperTitle
1 0.92814523 295 cvpr-2013-Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior
Author: Haichao Zhang, David Wipf, Yanning Zhang
Abstract: This paper presents a robust algorithm for estimating a single latent sharp image given multiple blurry and/or noisy observations. The underlying multi-image blind deconvolution problem is solved by linking all of the observations together via a Bayesian-inspired penalty function which couples the unknown latent image, blur kernels, and noise levels together in a unique way. This coupled penalty function enjoys a number of desirable properties, including a mechanism whereby the relative-concavity or shape is adapted as a function of the intrinsic quality of each blurry observation. In this way, higher quality observations may automatically contribute more to the final estimate than heavily degraded ones. The resulting algorithm, which requires no essential tuning parameters, can recover a high quality image from a set of observations containing potentially both blurry and noisy examples, without knowing a priorithe degradation type of each observation. Experimental results on both synthetic and real-world test images clearly demonstrate the efficacy of the proposed method.
2 0.91547179 307 cvpr-2013-Non-uniform Motion Deblurring for Bilayer Scenes
Author: Chandramouli Paramanand, Ambasamudram N. Rajagopalan
Abstract: We address the problem of estimating the latent image of a static bilayer scene (consisting of a foreground and a background at different depths) from motion blurred observations captured with a handheld camera. The camera motion is considered to be composed of in-plane rotations and translations. Since the blur at an image location depends both on camera motion and depth, deblurring becomes a difficult task. We initially propose a method to estimate the transformation spread function (TSF) corresponding to one of the depth layers. The estimated TSF (which reveals the camera motion during exposure) is used to segment the scene into the foreground and background layers and determine the relative depth value. The deblurred image of the scene is finally estimated within a regularization framework by accounting for blur variations due to camera motion as well as depth.
3 0.91241091 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
4 0.90596908 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?
Author: Filippo Bergamasco, Andrea Albarelli, Emanuele Rodolà, Andrea Torsello
Abstract: Traditional camera models are often the result of a compromise between the ability to account for non-linearities in the image formation model and the need for a feasible number of degrees of freedom in the estimation process. These considerations led to the definition of several ad hoc models that best adapt to different imaging devices, ranging from pinhole cameras with no radial distortion to the more complex catadioptric or polydioptric optics. In this paper we dai s .unive . it ence points in the scene with their projections on the image plane [5]. Unfortunately, no real camera behaves exactly like an ideal pinhole. In fact, in most cases, at least the distortion effects introduced by the lens should be accounted for [19]. Any pinhole-based model, regardless of its level of sophistication, is geometrically unable to properly describe cameras exhibiting a frustum angle that is near or above 180 degrees. For wide-angle cameras, several different para- metric models have been proposed. Some of them try to modify the captured image in order to follow the original propose the use of an unconstrained model even in standard central camera settings dominated by the pinhole model, and introduce a novel calibration approach that can deal effectively with the huge number of free parameters associated with it, resulting in a higher precision calibration than what is possible with the standard pinhole model with correction for radial distortion. This effectively extends the use of general models to settings that traditionally have been ruled by parametric approaches out of practical considerations. The benefit of such an unconstrained model to quasipinhole central cameras is supported by an extensive experimental validation.
5 0.89992416 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation
Author: Dohyung Seo, Jeffrey Ho, Baba C. Vemuri
Abstract: In this paper, we introduce a novel framework for computing a path of diffeomorphisms between a pair of input diffeomorphisms. Direct computation of a geodesic path on the space of diffeomorphisms Diff(Ω) is difficult, and it can be attributed mainly to the infinite dimensionality of Diff(Ω). Our proposed framework, to some degree, bypasses this difficulty using the quotient map of Diff(Ω) to the quotient space Diff(M)/Diff(M)μ obtained by quotienting out the subgroup of volume-preserving diffeomorphisms Diff(M)μ. This quotient space was recently identified as the unit sphere in a Hilbert space in mathematics literature, a space with well-known geometric properties. Our framework leverages this recent result by computing the diffeomorphic path in two stages. First, we project the given diffeomorphism pair onto this sphere and then compute the geodesic path between these projected points. Sec- ond, we lift the geodesic on the sphere back to the space of diffeomerphisms, by solving a quadratic programming problem with bilinear constraints using the augmented Lagrangian technique with penalty terms. In this way, we can estimate the path of diffeomorphisms, first, staying in the space of diffeomorphisms, and second, preserving shapes/volumes in the deformed images along the path as much as possible. We have applied our framework to interpolate intermediate frames of frame-sub-sampled video sequences. In the reported experiments, our approach compares favorably with the popular Large Deformation Diffeomorphic Metric Mapping framework (LDDMM).
6 0.88607424 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
same-paper 7 0.8660028 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
8 0.86478382 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
9 0.82859367 198 cvpr-2013-Handling Noise in Single Image Deblurring Using Directional Filters
10 0.82825577 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
11 0.80504644 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
12 0.77482396 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
13 0.76454794 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation
14 0.75377226 314 cvpr-2013-Online Object Tracking: A Benchmark
15 0.75114095 131 cvpr-2013-Discriminative Non-blind Deblurring
16 0.73758376 414 cvpr-2013-Structure Preserving Object Tracking
17 0.7332958 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
18 0.72467101 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
19 0.72104055 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration