iccv iccv2013 iccv2013-42 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
Reference: text
sentIndex sentText sentNum sentScore
1 ch on Abstract Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. [sent-5, score-0.407]
2 In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. [sent-6, score-0.719]
3 This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. [sent-7, score-0.792]
4 We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. [sent-8, score-1.885]
5 Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains. [sent-9, score-0.368]
6 Introduction In many state-of-the-art methods for semantic segmentation, contextual information plays a central role. [sent-11, score-0.201]
7 A suc- cessful trend has been to encode the contextual constraints with a Conditional Random Field (CRF) [11], by modeling the interactions between different regions and scales of the image. [sent-12, score-0.133]
8 Most methods use sophisticated potentials between different neighboring regions [7, 21], and the state-of-theart has been boosted with the use of high-order potentials in hierarchical CRFs [2, 9, 17]. [sent-13, score-0.882]
9 Another common way to include contextual information has been to extend image descriptors with contextual cues [6, 8, 15], or also, combining semantic classifiers fed from different contextual features [4, 13, 14]. [sent-14, score-0.518]
10 It is a remarkable feat the balance struck between accuracy and efficiency by the semantic texton forests of Shotton et al. [sent-15, score-0.112]
11 [12] ask the provocative question: ‘Are spatial and global constraints really necessary for segmentation? [sent-18, score-0.041]
12 ’ From the experimental results, they conclude that the CRF structures boost performance when the features only encode local in- Figure 1. [sent-19, score-0.044]
13 Active MAP selects the potentials to instantiate that maximize the expected reward. [sent-22, score-0.596]
14 Also, it estimates the MAP labeling from the incomplete energy function. [sent-23, score-0.461]
15 formation, whereas the further gain is very little when the features already encode contextual information. [sent-24, score-0.133]
16 This begs the question whether we can really benefit from CRFs in semantic segmentation when using such powerful features that already encode context. [sent-25, score-0.306]
17 We present a novel use of CRFs for semantic segmentation. [sent-26, score-0.112]
18 We exploit CRFs to estimate the semantic labeling without computing the descriptors and classifiers everywhere in the image. [sent-27, score-0.349]
19 Given a budget of time, it decides which potentials to compute. [sent-28, score-0.517]
20 This is because the computational burden of instantiating the potentials that extract descriptors and apply classifiers, which can be much higher than MAP inference for most of the energy functions in the literature [2, 6, 12]. [sent-30, score-1.09]
21 We introduce a relation between CRFs with some unknown unary potentials, which correspond to the features and classifiers that we do not compute, and the Perturb-andMAP (PM) random field model [16]. [sent-31, score-0.295]
22 We build our MAP inference algorithm - coined Active MAP inference - based on this finding. [sent-32, score-0.406]
23 We use the term ‘active’ because during inference it selects which potentials to instantiate on-the-fly. [sent-33, score-0.783]
24 This stands in contrast to previous MAP inference methods, which first execute the features/classifiers that instantiate 2233 1122 Original 100% 20% 5% Figure 2. [sent-34, score-0.342]
25 Active MAP inference using observing different amounts of unary potentials. [sent-36, score-0.294]
26 Results are obtained by selecting the unary potentials with the expected labeling change. [sent-37, score-0.706]
27 Surprisingly, seeing the instantiation of the CRF energy function and MAP-CRF inference as two joint steps received little attention in the community. [sent-39, score-0.467]
28 In a serie of experiments, we show that active MAP inference successfully exploits spatial consistency to avoid evaluating the classifiers and features everywhere. [sent-40, score-0.427]
29 It obtains comparable results to instantiating all the potentials in the CRF for the PASCAL VOC 2010 segmentation challenge [5] and for the MSRC-21 dataset [19], but with major efficiency gains. [sent-41, score-0.752]
30 1 we illustrate some results on semantic segmentation obtained with active MAP inference. [sent-43, score-0.342]
31 Active MAP Inference in CRFs This section describes the approach for active MAP inference. [sent-45, score-0.161]
32 Its formulation uses a CRF to model the probability density distribution expressing the likeliness of a certain labeling. [sent-46, score-0.158]
33 uti Loent, Gan =d X (V t,hEe) s beet othfe r garnadpohm t hvaatri raebplreess or sn soudcehs of the graph. [sent-48, score-0.04]
34 i} T hine welehmichen it s∈ o Vf V, an arde t ihned eicleems oefnt ths eof n oEd are tih. [sent-52, score-0.06]
35 tWse o fdEe no atree an instance of the random variables as x = {xi}, where xi atank einss a vnacleue o ffr tohme a sndeto omf d viasrciraebtele lsa absel xs L =. [sent-55, score-0.112]
36 {Txhu}s,, x ∈e LeN x, twakithes N a vthaleu cardinality to off fV d. [sent-56, score-0.037]
37 e probability density distributionW oef a labeling m|θo)de alse dth ewi pthro tbhaeb graph Gn. [sent-58, score-0.316]
38 ), Athcec probability density that satisfies the Markov properties with respect to the graph G is a Gibbs distribution. [sent-62, score-0.158]
39 Thus, P(x|θ) can be written as the normalized negative exponential of an energy function Eθ (x) = θTφ(x), in which φ(x) = (φ1(x), . [sent-63, score-0.22]
40 , φM(x))T is the vector of potentials, or the socalled sufficient statistics, and θ ∈ RM are the parameters ocafl tlehed potentials. [sent-66, score-0.035]
41 tWateis use ,t ahen dc aθno ∈ni Rcal over-complete representation, in which {φi (x)} are built using indicator functrieosnesn ttahtaiot na,l ilonww us hto{ express trheeb energy fgunincdtiiocant as suuncchlinear combination of the potentials (c. [sent-67, score-0.787]
42 yA ms uinsiumali,z we categorize the potentials of thex energy function depending on the number of random variables that they involve: unary and pairwise. [sent-75, score-0.805]
43 In the case of semantic segmentation, there is a node defined for each pixel or superpixel in the image. [sent-76, score-0.112]
44 The parameters of θ related to the unary potentials are typically the result of evaluating classifiers fed with features extracted from the image. [sent-77, score-0.722]
45 The pairwise and high-order potentials use some a priori assumptions like the smoothness of the labeling. [sent-78, score-0.441]
46 It is important to note that the instantiation of θ might be orders of magnitude more computationally expensive than MAP inference. [sent-79, score-0.096]
47 Usually, state-of-the-art methods for semantic segmentation use features and classifiers that take minutes to compute for a single image [2, 6, 12]. [sent-80, score-0.26]
48 At testing phase, the common way to proceed is to instantiate θ, and then to run an off-the-shelf MAP inference algorithm to obtain the most probable labeling. [sent-81, score-0.388]
49 T elheem computational gain comes sferloemct endo tb computing all classifiers and features needed to fully instantiate θ. [sent-84, score-0.274]
50 Even though we do not have the complete energy function anymore because part of θ is unknown, we will show in the sequel that we can still estimate x? [sent-85, score-0.304]
51 concept eofδ s ∈el e{c0te,1d} parameters in our notation, i. [sent-88, score-0.035]
52 (1) Note that with this notation we can still easily express the initial formulation that instantiates all parameters, using δ = 1and θ1, where 1is a vector of ones. [sent-96, score-0.153]
53 With missing parameters, the energy function does not represent the initial labeling problem anymore. [sent-97, score-0.413]
54 It would be wrong to replace the unknown parameters by 0, or any value indicating that ‘the potential is missing’ . [sent-98, score-0.148]
55 There is no guarantee that, in doing so, the new energy function would assign energy values similar to the ones given by the complete energy. [sent-99, score-0.484]
56 Given a budget of time, Active MAP inference instantiates a subset of the potentials (δ), and only with them, it computes the complete MAP labeling (x? [sent-101, score-0.987]
57 Finally, we show results for the application of semantic image segmentation, where we save the cost of instantiating all the unary potentials. [sent-106, score-0.461]
58 Active MAP inference is more general and can also be applied in many other applications. [sent-107, score-0.187]
59 Recently, Papandreou and Yuille introduced the PM random field [16], which is a model that allows for generating samples, built around the effective MAP inference algorithms in CRF. [sent-111, score-0.224]
60 PM is based on injecting noise in the energy function to perturb it, and then, it calculates the frequency that labelings are the MAP of the perturbed energy. [sent-114, score-0.616]
61 ∈ RM be the raarend tohem MvaArPiab olef tthheat p iet tisu rubseedd eton perturb tth ? [sent-116, score-0.298]
62 We denote the perturbed parameters of the energy as θ˜ = θ + ? [sent-121, score-0.365]
63 For each perturbed θ˜, we can infer a MAP labeling. [sent-123, score-0.11]
64 The different θ˜s that yield the same MAP labeling x, can be grouped together. [sent-124, score-0.158]
65 (2) Analogously, we can define the set of perturbations ? [sent-130, score-0.052]
66 , (3) Intuitively, the PM calculates how frequent is that a labeling x is the MAP labeling, when injecting noise to the energy function. [sent-151, score-0.506]
67 Even though calculating the exact value of fPM (x; θ) might be not feasible for most practical cases, note that we can easily draw samples from a PM distribution by simply doing MAP inference on a perturbed energy. [sent-152, score-0.342]
68 For a complete explanation of the PM random field we refer the reader to the paper [16]. [sent-153, score-0.081]
69 MAP Inference for Incomplete Energies This section aims at estimating the labeling from the incomplete energy function. [sent-155, score-0.461]
70 We assume that δ is given, and the potentials indicated by δ have been instantiated. [sent-156, score-0.441]
71 Relation to Perturb-and-MAP Rather than filling in the energy function by inventing the unknown parameters or setting them to a learned constant value, we use P(θ|θδ) to model them. [sent-159, score-0.327]
72 P(θ|θδ) is tshtaen probability eth uaste eth Pe parameters oofd tehle potentials θt|akθe the values θ given θδ. [sent-160, score-0.673]
73 The CRF models the probability of the labeling, but it does not directly model P(θ|θδ). [sent-161, score-0.079]
74 In order tlaob aell einvgi,at beu tth iet l daockes so nf an deirxeaccttl expression (fθo|rθ θP(θ|θδ), we use a vmioatdee tlh teo l approximate cit , e xrepfreersrseido nto f as f(θθ (|θθ|δ, π), where π are the parameters of the model. [sent-162, score-0.194]
75 Changing θ in the energy function produces different MAP labelings, x? [sent-165, score-0.22]
76 f o=re ,x P|θ(δθ) tθo define such probability on x? [sent-170, score-0.079]
77 P(θ|θδ)dθ, (4) where I[·] is the indicator function. [sent-179, score-0.056]
78 (4) can be seen as a nhaertuer Ia[l· way teo cnadliccualtaotre Pun(cXtio? [sent-181, score-0.037]
79 θ (4δ)), sainnc bee i st accumulates the probability density of P=(θ x|θ|θδ) with θ yielding mtheu amtiensi tmheum pr energy labeling equal (tθo x. [sent-184, score-0.578]
80 The integral explores all complete energy functions, Eθ (x), and for each of them, it checks whether the MAP labeling is x or not. [sent-185, score-0.422]
81 In case it is equal to x, the corresponding probability density of P(θ|θδ) is accumulated into the final probability. [sent-186, score-0.158]
82 = x|θδ) is indeed a PM random field, from which we can easily draw samples. [sent-190, score-0.045]
83 Let f(Pθ|Mδ (,xπ|)δ,, aμnd) b fe t(hθe|δ d,eπn-) sity dmiestarnib euqtiuoanl of a P ∈M R Rmodel with energy Eμ(x), hi. [sent-194, score-0.253]
84 d tehneenergy with parameter μ ∈ RM, and the perturbations are drawn from ? [sent-196, score-0.052]
85 Observe that the density distribution of the PM model in Prop. [sent-202, score-0.079]
86 ∈ RM such that x minimizes wtheh energy −fu μnc itsio thne E(μ+? [sent-207, score-0.26]
87 t t+his μ |PδM,π d)i,s wtrihbiucthio ins reproduces the definition of P(X? [sent-215, score-0.037]
88 The parameters for this model are the mean and( θth|δe, πsta). [sent-231, score-0.035]
89 nd Tahrde pdaervamiateitoenr,s rfeorfer thriesd mtoo as μ ∈ RM and σ ∈ RM respectively, where for notation simplicity π din σdica ∈te Rs both μ and σ. [sent-232, score-0.037]
90 Specifically, we define fθ (θ|δ, π) such that, if the parameter of the potential is unkno(wθn|δ ,(δπi = s 0c)h, it is a univariate Gaussian distribution, centered at μi and deviation σi. [sent-234, score-0.041]
91 Otherwise it is consistent with the instantiated potential, fθ (θi |δi = 1, πi) = I[θi = θδii], where I[·] is the indicator functi|oδn. [sent-235, score-0.156]
92 The algorithm starts from δ = 0, and it sequentially determines which potential to compute next, until the time budget, ttotal, expires. [sent-243, score-0.041]
93 We denote the known potentials at time t as θδt . [sent-244, score-0.441]
94 The algorithm ranks the unknown potentials with a score, and thus prioritizes the potentials in the time budget. [sent-245, score-1.035]
95 This is done by selecting the potentials with higher score. [sent-246, score-0.441]
96 We define Sδit as the expected reward of instantiating the potential i. [sent-250, score-0.376]
97 , wwhheicrhe tish eth eex pGeacutesdsi vaan umeo idsel o voefr t θhe ∼ poste? [sent-265, score-0.079]
98 R(·) is the reward of instantiating θi = θ, and site etv taolu θa. [sent-271, score-0.37]
wordName wordTfidf (topN-words)
[('potentials', 0.441), ('instantiating', 0.242), ('crfs', 0.221), ('energy', 0.22), ('crf', 0.194), ('inference', 0.187), ('pm', 0.184), ('px', 0.162), ('fpm', 0.161), ('active', 0.161), ('labeling', 0.158), ('map', 0.157), ('instantiate', 0.155), ('rm', 0.114), ('semantic', 0.112), ('perturbed', 0.11), ('unary', 0.107), ('instantiated', 0.1), ('perturb', 0.096), ('reward', 0.093), ('contextual', 0.089), ('incomplete', 0.083), ('instantiates', 0.081), ('density', 0.079), ('probability', 0.079), ('classifiers', 0.079), ('budget', 0.076), ('injecting', 0.075), ('unknown', 0.072), ('segmentation', 0.069), ('argx', 0.067), ('labelings', 0.062), ('instantiation', 0.06), ('eof', 0.06), ('fed', 0.06), ('oxf', 0.058), ('indicator', 0.056), ('calculates', 0.053), ('perturbations', 0.052), ('iet', 0.052), ('probable', 0.046), ('draw', 0.045), ('complete', 0.044), ('encode', 0.044), ('bee', 0.042), ('really', 0.041), ('potential', 0.041), ('ranks', 0.041), ('alue', 0.04), ('anymore', 0.04), ('beet', 0.04), ('begs', 0.04), ('endo', 0.04), ('functi', 0.04), ('gemma', 0.04), ('itsio', 0.04), ('nxg', 0.04), ('papandreou', 0.04), ('prioritizes', 0.04), ('qx', 0.04), ('rior', 0.04), ('tohem', 0.04), ('tohme', 0.04), ('tshtaen', 0.04), ('tthheat', 0.04), ('voefr', 0.04), ('xly', 0.04), ('eth', 0.039), ('ln', 0.038), ('pe', 0.038), ('reproduces', 0.037), ('beu', 0.037), ('ffr', 0.037), ('olef', 0.037), ('thex', 0.037), ('tphoes', 0.037), ('unkno', 0.037), ('vthaleu', 0.037), ('teo', 0.037), ('field', 0.037), ('notation', 0.037), ('orders', 0.036), ('parameters', 0.035), ('ahen', 0.035), ('argminx', 0.035), ('etv', 0.035), ('len', 0.035), ('lsa', 0.035), ('ltoe', 0.035), ('pam', 0.035), ('tarlow', 0.035), ('ttioo', 0.035), ('missing', 0.035), ('express', 0.035), ('sity', 0.033), ('ddis', 0.033), ('xavier', 0.033), ('tth', 0.033), ('barcelona', 0.032), ('coined', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
2 0.1978907 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
Author: Dahua Lin, Sanja Fidler, Raquel Urtasun
Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.
3 0.18726686 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
Author: Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly estimate the layout ofrooms as well as the clutterpresent in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
4 0.15589763 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
Author: Ryan Tokola, Wongun Choi, Silvio Savarese
Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).
5 0.13677178 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
6 0.13400194 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
7 0.12411442 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
8 0.12054215 246 iccv-2013-Learning the Visual Interpretation of Sentences
9 0.11276389 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
10 0.10449314 2 iccv-2013-3D Scene Understanding by Voxel-CRF
11 0.097837478 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
12 0.09753003 6 iccv-2013-A Convex Optimization Framework for Active Learning
13 0.086507075 309 iccv-2013-Partial Enumeration and Curvature Regularization
14 0.086465538 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
15 0.084300019 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
16 0.083222352 282 iccv-2013-Multi-view Object Segmentation in Space and Time
17 0.082772925 379 iccv-2013-Semantic Segmentation without Annotating Segments
18 0.078946047 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
19 0.078214526 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf
20 0.077416733 324 iccv-2013-Potts Model, Parametric Maxflow and K-Submodular Functions
topicId topicWeight
[(0, 0.17), (1, -0.014), (2, 0.01), (3, -0.015), (4, 0.087), (5, 0.024), (6, -0.095), (7, 0.046), (8, 0.012), (9, -0.137), (10, -0.01), (11, 0.06), (12, -0.015), (13, 0.036), (14, 0.03), (15, 0.024), (16, -0.102), (17, -0.054), (18, -0.092), (19, -0.058), (20, -0.063), (21, -0.038), (22, -0.029), (23, -0.013), (24, 0.025), (25, -0.085), (26, 0.112), (27, 0.004), (28, -0.011), (29, 0.002), (30, -0.055), (31, -0.017), (32, 0.088), (33, -0.025), (34, 0.098), (35, -0.065), (36, -0.12), (37, 0.003), (38, -0.018), (39, 0.093), (40, -0.043), (41, -0.039), (42, 0.111), (43, 0.06), (44, 0.038), (45, 0.033), (46, -0.006), (47, -0.051), (48, 0.008), (49, -0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.97447318 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
2 0.68740463 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan
Abstract: We propose an adaptive subgradient descent method to efficiently learn the parameters of CRF models for image parsing. To balance the learning efficiency and performance of the learned CRF models, the parameter learning is iteratively carried out by solving a convex optimization problem in each iteration, which integrates a proximal term to preserve the previously learned information and the large margin preference to distinguish bad labeling and the ground truth labeling. A solution of subgradient descent updating form is derived for the convex optimization problem, with an adaptively determined updating step-size. Besides, to deal with partially labeled training data, we propose a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training data for the parameter learning of CRF models. The superior learning efficiency of the proposed method is verified by the experiment results on two public datasets. We also demonstrate the powerfulness of our method for handling partially labeled training data.
3 0.67046237 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
Author: Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly estimate the layout ofrooms as well as the clutterpresent in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
4 0.63212097 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
Author: Masoud S. Nosrati, Shawn Andrews, Ghassan Hamarneh
Abstract: The inclusion of shape and appearance priors have proven useful for obtaining more accurate and plausible segmentations, especially for complex objects with multiple parts. In this paper, we augment the popular MumfordShah model to incorporate two important geometrical constraints, termed containment and detachment, between different regions with a specified minimum distance between their boundaries. Our method is able to handle multiple instances of multi-part objects defined by these geometrical hamarneh} @ s fu . ca (a)Standar laΩb ehlingΩfuhnctionseting(Ωb)hΩOuirseΩtijng Figure 1: The inside vs. outside ambiguity in (a) is resolved by our containment constraint in (b). constraints using a single labeling function while maintaining global optimality. We demonstrate the utility and advantages of these two constraints and show that the proposed convex continuous method is superior to other state-of-theart methods, including its discrete counterpart, in terms of memory usage, and metrication errors.
5 0.61934412 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf
Author: empty-author
Abstract: Submodular functions can be exactly minimized in polynomial time, and the special case that graph cuts solve with max flow [19] has had significant impact in computer vision [5, 21, 28]. In this paper we address the important class of sum-of-submodular (SoS) functions [2, 18], which can be efficiently minimized via a variant of max flow called submodular flow [6]. SoS functions can naturally express higher order priors involving, e.g., local image patches; however, it is difficult to fully exploit their expressive power because they have so many parameters. Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set. We adopt a structural SVM approach [15, 34] and formulate the training problem in terms of quadratic programming; as a result we can efficiently search the space of SoS priors via an extended cutting-plane algorithm. We also show how the state-of-the-art max flow method for vision problems [11] can be modified to efficiently solve the submodular flow problem. Experimental comparisons are made against the OpenCVimplementation ofthe GrabCut interactive seg- mentation technique [28], which uses hand-tuned parameters instead of machine learning. On a standard dataset [12] our method learns higher order priors with hundreds of parameter values, and produces significantly better segmentations. While our focus is on binary labeling problems, we show that our techniques can be naturally generalized to handle more than two labels.
6 0.61492622 150 iccv-2013-Exemplar Cut
7 0.61231512 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
8 0.60143894 324 iccv-2013-Potts Model, Parametric Maxflow and K-Submodular Functions
9 0.59948534 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
10 0.56904697 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
11 0.56838953 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
12 0.55529004 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
13 0.54469377 2 iccv-2013-3D Scene Understanding by Voxel-CRF
14 0.52939492 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
15 0.51688683 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
16 0.50510699 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
17 0.49996752 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
19 0.48470882 282 iccv-2013-Multi-view Object Segmentation in Space and Time
20 0.48002061 186 iccv-2013-GrabCut in One Cut
topicId topicWeight
[(2, 0.043), (7, 0.019), (12, 0.014), (16, 0.231), (26, 0.099), (27, 0.012), (31, 0.047), (40, 0.017), (42, 0.1), (48, 0.01), (64, 0.036), (73, 0.036), (89, 0.236), (98, 0.015)]
simIndex simValue paperId paperTitle
1 0.87073278 304 iccv-2013-PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
Author: Philipp Heise, Sebastian Klose, Brian Jensen, Alois Knoll
Abstract: Most stereo correspondence algorithms match support windows at integer-valued disparities and assume a constant disparity value within the support window. The recently proposed PatchMatch stereo algorithm [7] overcomes this limitation of previous algorithms by directly estimating planes. This work presents a method that integrates the PatchMatch stereo algorithm into a variational smoothing formulation using quadratic relaxation. The resulting algorithm allows the explicit regularization of the disparity and normal gradients using the estimated plane parameters. Evaluation of our method in the Middlebury benchmark shows that our method outperforms the traditional integer-valued disparity strategy as well as the original algorithm and its variants in sub-pixel accurate disparity estimation.
2 0.86167252 177 iccv-2013-From Point to Set: Extend the Learning of Distance Metrics
Author: Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
Abstract: Most of the current metric learning methods are proposed for point-to-point distance (PPD) based classification. In many computer vision tasks, however, we need to measure the point-to-set distance (PSD) and even set-to-set distance (SSD) for classification. In this paper, we extend the PPD based Mahalanobis distance metric learning to PSD and SSD based ones, namely point-to-set distance metric learning (PSDML) and set-to-set distance metric learning (SSDML), and solve them under a unified optimization framework. First, we generate positive and negative sample pairs by computing the PSD and SSD between training samples. Then, we characterize each sample pair by its covariance matrix, and propose a covariance kernel based discriminative function. Finally, we tackle the PSDML and SSDMLproblems by using standard support vector machine solvers, making the metric learning very efficient for multiclass visual classification tasks. Experiments on gender classification, digit recognition, object categorization and face recognition show that the proposed metric learning methods can effectively enhance the performance of PSD and SSD based classification.
same-paper 3 0.85407102 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
4 0.85301405 101 iccv-2013-DCSH - Matching Patches in RGBD Images
Author: Yaron Eshet, Simon Korman, Eyal Ofek, Shai Avidan
Abstract: We extend patch based methods to work on patches in 3D space. We start with Coherency Sensitive Hashing [12] (CSH), which is an algorithm for matching patches between two RGB images, and extend it to work with RGBD images. This is done by warping all 3D patches to a common virtual plane in which CSH is performed. To avoid noise due to warping of patches of various normals and depths, we estimate a group of dominant planes and compute CSH on each plane separately, before merging the matching patches. The result is DCSH - an algorithm that matches world (3D) patches in order to guide the search for image plane matches. An independent contribution is an extension of CSH, which we term Social-CSH. It allows a major speedup of the k nearest neighbor (kNN) version of CSH - its runtime growing linearly, rather than quadratically, in k. Social-CSH is used as a subcomponent of DCSH when many NNs are required, as in the case of image denoising. We show the benefits ofusing depth information to image reconstruction and image denoising, demonstrated on several RGBD images.
5 0.82029027 17 iccv-2013-A Global Linear Method for Camera Pose Registration
Author: Nianjuan Jiang, Zhaopeng Cui, Ping Tan
Abstract: We present a linear method for global camera pose registration from pairwise relative poses encoded in essential matrices. Our method minimizes an approximate geometric error to enforce the triangular relationship in camera triplets. This formulation does not suffer from the typical ‘unbalanced scale ’ problem in linear methods relying on pairwise translation direction constraints, i.e. an algebraic error; nor the system degeneracy from collinear motion. In the case of three cameras, our method provides a good linear approximation of the trifocal tensor. It can be directly scaled up to register multiple cameras. The results obtained are accurate for point triangulation and can serve as a good initialization for final bundle adjustment. We evaluate the algorithm performance with different types of data and demonstrate its effectiveness. Our system produces good accuracy, robustness, and outperforms some well-known systems on efficiency.
6 0.81206965 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
7 0.77563536 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
8 0.77170658 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
9 0.77003825 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
10 0.7694701 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
11 0.7679466 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
12 0.76780158 35 iccv-2013-Accurate Blur Models vs. Image Priors in Single Image Super-resolution
13 0.76731968 18 iccv-2013-A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution
14 0.76625234 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction
15 0.76584876 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
16 0.76576692 309 iccv-2013-Partial Enumeration and Curvature Regularization
17 0.76566136 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
18 0.76550663 128 iccv-2013-Dynamic Probabilistic Volumetric Models
19 0.76529789 190 iccv-2013-Handling Occlusions with Franken-Classifiers