iccv iccv2013 iccv2013-234 knowledge-graph by maker-knowledge-mining

234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent

Source: pdf

Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan

Abstract: We propose an adaptive subgradient descent method to efficiently learn the parameters of CRF models for image parsing. To balance the learning efficiency and performance of the learned CRF models, the parameter learning is iteratively carried out by solving a convex optimization problem in each iteration, which integrates a proximal term to preserve the previously learned information and the large margin preference to distinguish bad labeling and the ground truth labeling. A solution of subgradient descent updating form is derived for the convex optimization problem, with an adaptively determined updating step-size. Besides, to deal with partially labeled training data, we propose a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training data for the parameter learning of CRF models. The superior learning efficiency of the proposed method is verified by the experiment results on two public datasets. We also demonstrate the powerfulness of our method for handling partially labeled training data.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A solution of subgradient descent updating form is derived for the convex optimization problem, with an adaptively determined updating step-size. [sent-3, score-0.847]

2 Besides, to deal with partially labeled training data, we propose a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training data for the parameter learning of CRF models. [sent-4, score-0.991]

3 We also demonstrate the powerfulness of our method for handling partially labeled training data. [sent-6, score-0.256]

4 Introduction The Conditional Random Field [19] (CRF) offers a powerful probabilistic formulation for image parsing problems. [sent-8, score-0.139]

5 It has been demonstrated in previous works [18, 11, 16] that integration of different types of cues in a CRF model can significantly improve the parsing accuracy, like the smoothness preference and global consistency. [sent-9, score-0.189]

6 However, how to properly combine multiple types of information in a CRF model to achieve excellent parsing performance still remains an open question. [sent-10, score-0.139]

7 For this reason, the parameter learning of CRF models for image parsing tasks has received increasing attention recently. [sent-11, score-0.291]

8 Considerable progress on the parameter learning of CRF models has been made in the past few years. [sent-12, score-0.152]

9 However, the parameter learning of CRF models for the image parsing tasks still remains a challenging problem for several reasons. [sent-13, score-0.291]

10 First, as the CRF models used in many image parsing problems are of large scale and include expressive intervariable interactions, the computational challenges make the parameter learning of CRF models difficult. [sent-14, score-0.331]

11 Given a large number of training images, the learning efficiency would become a critical issue. [sent-15, score-0.152]

12 Second, partially labeled training data could cause the failure of some learning methods, which is common in image parsing. [sent-16, score-0.313]

13 For example, it has been found that the learned parameters involved in the pairwise smoothness potential are forced to tend toward zeros when using partially labeled training data [25]. [sent-17, score-0.411]

14 In this paper, we propose an adaptive subgradient descent method that iteratively learns the parameters of CRF models for image parsing. [sent-18, score-0.655]

15 The parameter learning is iteratively carried out by solving a convex optimization problem in each iteration. [sent-19, score-0.22]

16 The solution for the convex optimization problem gives a subgradient descent updating form with an adaptively determined updating step-size which can well balance the learning efficiency and performance of the learned CRF models. [sent-20, score-1.049]

17 Meanwhile, to deal with partially labeled training images that are common in various image parsing tasks, a new objective constraint for the parameter learning of CRF models is proposed, which models both the labeled and unlabeled parts of partially labeled training images. [sent-21, score-1.21]

18 Related work The parameter learning of CRF models is an active research topic, and investigated in many previous works [7, 27, 23, 20, 12, 2, 21, 15, 9]. [sent-24, score-0.152]

19 Most current methods for the parameter learning of CRF models can be broadly classified into two categories: maximum likelihood-based methods [19, 17] and max-margin methods [7, 27, 23, 12]. [sent-25, score-0.152]

20 En exhaustive review of the literature is beyond the scope of this paper, and the following review will mainly focus on the max-margin methods in which the parameter learning of CRF models is formulated as a structure learning problem based on the max-margin formulation. [sent-26, score-0.209]

21 Naturally, the max-margin methods for general structure learning can be used for the parameter learning of CRF models, such as the 1-slack and n-slack StructSVM(structural SVM) [27, 12], M3N(max-margin markov network) [7] and Projected Subgradient [23]. [sent-27, score-0.189]

22 The subgradient method [23] is another popular solution for structure learning problems, which is usually efficient and easy to implement. [sent-29, score-0.538]

23 Based on the subgradient method, recent works on the parameter learning of CRF models [15, 24] adopt different decomposition techniques. [sent-30, score-0.633]

24 As the updating step-sizes for the subgradient descent in these methods [23, 15, 24] are predefined and oblivious to the characteristics of the data being observed, to balance the learning efficiency and performance of the learned models, the updating step-sizes need to be carefully chosen. [sent-33, score-0.939]

25 Inappropriate updating step-sizes could lead to bad performance ofthe learned CRF models or slow convergence. [sent-34, score-0.236]

26 This motivates us to improve the subgradient method for the parameter learning of CRF models by adaptively tuning the subgradient descent, termed as adaptive subgradient descent in this paper. [sent-35, score-1.763]

27 However, for the parameter learning problem of CRF models, the optimal value of the objective function is unknown , and how to estimate it is also unclear. [sent-37, score-0.154]

28 Another important but less discussed issue in the previous max-margin methods for the parameter learning of CRF models is related to partially labeled training data, which is common in image parsing tasks. [sent-38, score-0.547]

29 To deal with the partially labeled training data, a maximum likelihood-based method which approximates the partition function with the Bethe free energy function is proposed in [28], with some limitations of the Bethe approximation discussed in [10]. [sent-39, score-0.281]

30 It has been observed that different treatments of the partially labeled data could lead to quite different performance. [sent-40, score-0.212]

31 To deal with the partially labeled training data, we introduce latent variables in the CRF models, inspired by the work [29, 15]. [sent-41, score-0.256]

32 Learning CRF to Parse Images Random field models are widely used to formulate various image parsing problems. [sent-43, score-0.179]

33 Based on this linear model, the parameter learning of CRF models can be cast as a typical structure learning problem, with the corresponding energy functions for the CRF models expressed as: E(x,y) = w · Φ(x,y) = ? [sent-67, score-0.274]

34 c∈C In the following, we review the widely used max-margin formulation for the parameter learning of CRF models: 1-slack and n-slack StructSVM Given a training set {(xn, yn)}nN=1, using the n-slack StructSVM [27], the learning problem can be formulated as [26]: argmwin21? [sent-69, score-0.213]

35 1 The objective constraint in the 1-slack StructSVM is obtained by merging the objective constraints for each sample of the training set {(xn, yn)}nN=1 in the n-slack StructSVM, owfi tthh ex t∗r =in ∪xn, eyt ∗{ (=x ∪yn a)n}d ˆy = ∪ yˆn. [sent-87, score-0.183]

36 The 1-slack and nslack Str=uc ∪tSxVM m=et ∪hoyds iteratively update the parameters to be learned by the cutting plane algorithm. [sent-88, score-0.137]

37 Unconstrained max-margin formulation An unconstrained formulation of (5) is adopted in [23], which uses the projected subgradient method to minimize the following regularized objective function: ρ(w) =12? [sent-90, score-0.616]

38 The parameters to be learned are iteratively updated by: wt+1 = P[wt − αtgw] (10) where gw is the subgradient of the convex function (8), P is the projection operator and αt is the predefined step-size that needs to be chosen carefully. [sent-93, score-0.646]

39 Inappropriate updating step-size could lead to bad performance of the learned CRF models or slow convergence. [sent-94, score-0.236]

40 Adaptive Subgradient Descent Learning In this section, we propose an adaptive subgradient descent algorithm for the parameter learning of CRF models, as described in the algorithm 1. [sent-96, score-0.706]

41 It is motivated by applying the idea proposed in the proximal bundle method [13] that uses proximal functions to control the learning rate to the subgradient methods in which the learning rate is subtly controlled by the predefined step-sizes. [sent-97, score-0.711]

42 In each iteration, the parameter updating is carried out by solving a convex optimization problem which integrates a proximal term to preserve the previously learned information and the large margin preference to distinguish bad labeling and the ground truth labeling. [sent-98, score-0.444]

43 The solution for the convex optimization problem gives a subgradient descent update form with an adaptively determined updating step-size for the parameter learning, which well balances the learning efficiency and performance of the learned CRF models. [sent-99, score-0.994]

44 A typical training process of using the proposed algorithm to train CRF models for image parsing is shown in Figure 1. [sent-100, score-0.223]

45 Adaptive subgradient descent algorithm In each iteration of the algorithm 1, the adaptive subgradient descent updating is carried out by solving the following convex optimization problem which has a subgradientbased solution with an adaptively determined step-size: wt+1 = argmwin21? [sent-103, score-1.416]

46 Therefore, an objective constraint same as that in the 1-slack StructSVM is used in the optimization problem (11): H(w; x∗, y∗, ˆyt) ≤ ξ (12) yˆt is the merged labeling configuration that most violates the constraint (12) for the current parameter wt. [sent-109, score-0.262]

47 On the other hand, a proximal term that forces the learned parameter wt+1 to stay as close as possible to wt is inserted into the objective function of (11), so that the previous learned information can be preserved. [sent-110, score-0.383]

48 o ξr , 8: Output: wf = arg minwt∈{wt}tM=1 H(wt; x∗ , y∗ , ˆyt) ∈{w}H(w In addition to the objective constraint (12), we add one more constraint that the parameters to be learned are nonnegative, similar to the previous work [26]. [sent-121, score-0.268]

49 0wti− αtdi oift whetir−wis αetdi≥ 0; αt = mαaxL(α),0 ≤ α ≤ C (13) (14) where [d1, d2 , · · · , dK] is the subgradient of the empirical risk (9). [sent-132, score-0.481]

50 Learning with Partially Labeled Image Partially labeled training images are common in image parsing problems, as it is usually very time-consuming to get precise annotations by manual labeling. [sent-167, score-0.293]

51 A typical partially labeled example is shown in Figure 2(a). [sent-168, score-0.212]

52 The unlabeled regions in partially labeled training images are not trivial for the parameter learning of CRF models, as observed in previous works [25, 28]. [sent-169, score-0.587]

53 As evaluating the loss on the unlabeled regions during the learning process is not feasible, discarding the unlabeled regions would be a straightforward choice, which excludes the unlabeled regions from the CRF models built for the partially labeled training images in the learning process. [sent-170, score-1.132]

54 However, without considering the unlabeled regions, the interactions between the labeled 33007836 (a) (b) (c) Figure 2. [sent-171, score-0.29]

55 The unlabeled regions are shown in black; (b) and (c), the pairwise CRF models for the parameter learning with different ways to treat the unlabeled regions in the training image. [sent-173, score-0.707]

56 (b) using the constraint (20), the nodes in the unlabeled regions and links linked to them are shown in green. [sent-174, score-0.274]

57 (c) discarding the unlabeled regions in the parameter learning, with the nodes and links for the unlabeled regions in (b) excluded. [sent-175, score-0.558]

58 regions and the unlabeled regions will not be modeled in the learning process. [sent-176, score-0.335]

59 For example, for the boundaries between the labeled regions and unlabeled regions, as these boundaries are mostly not the real boundaries between different categories, the pairwise smoothness should be preserved on these boundaries. [sent-178, score-0.437]

60 Without the interactions between the labeled regions and the unlabeled regions, the pairwise smoothness constraint on these boundaries will not be encoded in the learning process. [sent-179, score-0.527]

61 Let Rk and Ru denote the labeled regions and the unlabeled regions in the partially labeled training images, yk∗ denote the ground truth label for Rk. [sent-181, score-0.667]

62 Then, the new objective constraint is defined as: H(w; x∗, yt∗, ˆyt) ≤ ξ (20) where the ground truth label yt∗ = yk∗ ∪ yˆtu consists of the ground truth label yk∗ for Rk and the predicted label yˆtu for Ru. [sent-183, score-0.166]

63 Note that when there are no unlabeled regions in the training images, (12) and (20) are the same. [sent-184, score-0.263]

64 A simple pairwise CRF model for a partially labeled training image is shown in Figure 2, with different ways to handle the unlabeled regions in the partially labeled training images illustrated. [sent-185, score-0.779]

65 c∈S = = E(x,y) wu · fu (x, y) + wp · fp (x, y) + wc · fc(x, y) wT · Φ(x, y) (21) ? [sent-191, score-0.144]

66 ,j) ∈E where fu (x, y), fp (x, y) and fc(x, y) are the label dependent feature vectors for the unary potential, pairwise potential and high order potential to enforce label consistency, and Φ(x, y) = [fu (x, y) , fp(x, y) , fc(x, y)] , w = [wu, wp, wc] . [sent-193, score-0.2]

67 The parameters to be learned include wu = [wu], wc = [wc] and wp = [w1p, wp2, · · · , wpL], where L is the number of categories. [sent-194, score-0.207]

68 Similar to ,[2··5·], , wthe unary potential is defined on pixel level, multiplied by the weight parameter wu. [sent-195, score-0.161]

69 The procedure of the evaluation includes three successive steps: 1) unary potential training which is identical in our method, the 1-slack StructSVM method and the Projected Subgradient method; 2) parameter learning of the CRF model; 3) testing of the learned CRF model. [sent-206, score-0.326]

70 As our focus is evaluating the parameter learning algorithms for CRF models, we choose simple patch level features: Texton + SIFT for the unary potential training, − with the Random Forest classifier [6](50 trees, max depth = 32) chosen as the classifier model. [sent-208, score-0.198]

71 The parameter learning of 1As the energy function for the Robust PN model is submodular, no decomposition in [15] is necessary for the model, which makes [15] and the Projected Subgradient method [23] equivalent in this situation. [sent-210, score-0.137]

72 33007847 the Robust PN model starts with the unary classification, with the parameters to be learned initialized as wu = 1, wp = 0, wc = 0 for our method, the 1-slack StructSVM method and Projected Subgradient method. [sent-211, score-0.267]

73 For the performance evaluation, we use two criteria: CAA (category average accuracy, the average proportion of pixels correctly labeled in each category) and GA (global accuracy, proportion of all pixels correctly labeled) same as the previous works on the image parsing [25, 18]. [sent-213, score-0.249]

74 The segmentation accuracy obtained with unary classification, the Robust PN models learned by the 1-slack StructSVM method, Projected Subgradient method and our method on the datasets for the evaluation. [sent-231, score-0.215]

75 Performance of the learned models The segmentation accuracy achieved with the unary classification as well as the Robust PN models learned by our method, the 1-slack StructSVM method and Projected Subgradient method on the two datasets for the evaluation is given in Table 1. [sent-235, score-0.362]

76 These critical parameters include the initial upper bound of the updating step-size in our method, the updating step-sizes and the weight of constraint violation in the Projected Subgradient method, the weight of slack variable in the 1-slack StructSVM method. [sent-237, score-0.236]

77 Several examples of the parsing results obtained by different methods are illustrated in Figure 3. [sent-239, score-0.139]

78 The parsing results obtained by unary classification and the Robust PN models learned by the 1-slack StructSVM [12], Projected Subgradient [23] and our method on the MSRC-21 dataset and the CBCL street scene dataset. [sent-242, score-0.323]

79 The average time cost of the 1-slack StructSVM method, Projected Subgradient method and our method for the parameter learning of the Robust PN models on the MSRC-21 dataset and CBCL StreetScenes dataset. [sent-244, score-0.152]

80 From the comparison in Table 1(a), we find that compared with the unary classification, the segmentation accuracy is improved to varying degrees by the Robust PN models learned by all the three methods. [sent-248, score-0.215]

81 The segmentation accuracy achieved with the Robust PN model learned by our method outperforms that achieved with the model learned by the 1-slack StructSVM method and is comparable to that achieved with the Robust PN model learned by the Projected Subgradient method. [sent-249, score-0.352]

82 The segmentation accuracy achieved with the Robust PN models learned by our method, the 1-slack StructSVM method and Projected Subgradient method is given in Table 1(b). [sent-252, score-0.178]

83 Similar to the result on the MSRC-21 dataset, the Robust PN models learned by all the three methods improve the segmentation accuracy to varying degrees. [sent-253, score-0.155]

84 The number of iterations used to train the Robust PN models in the Projected Subgradient method [23] and our method to the corresponding parsing error achieved with the learned Robust PN models on the MSRC-21 dataset and the CBCL StreetScenes dataset. [sent-260, score-0.351]

85 The parsing error is measured by the loss of GA (global accuracy) and CAA (category average accuracy). [sent-261, score-0.139]

86 also find that the segmentation accuracy achieved with the Robust PN model learned by our method is slightly better than that achieved with the models learned by the 1-slack StructSVM method and Projected Subgradient method. [sent-262, score-0.285]

87 The learning efficiency of our method and the Projected Subgradient method depends on the predefined numbers of iterations used to train the CRF models, such as M in the algorithm 1. [sent-266, score-0.138]

88 In Figure 4, we find that on the test set of both datasets for the evaluation, the parsing errors of the Robust PN model learned by our method become stable rapidly, after only five iterations. [sent-268, score-0.223]

89 By contrast, on the test sets of both datasets for the evaluation, the parsing errors of the Robust PN model learned by the Projected Subgradient method decreased gradually, approximately stable after 200 iterations. [sent-269, score-0.223]

90 The segmentation accuracy achieved on the MSRC-21 and CBCL dataset with the Robust PN models learned by our method, the 1-slack StructSVM [12] and Projected Subgradient method [23], using different ways to treat the unlabeled regions. [sent-289, score-0.39]

91 The parsing results obtained with the Robust PN models learned by our method, with different ways to treat the unlabeled regions in partially labeled training images. [sent-291, score-0.79]

92 This indicates that modeling the interactions between the labeled regions and unlabeled regions of the partially labeled training images in the learning process is important for our method and the Projected Subgradient method. [sent-294, score-0.721]

93 Please note that the results of our method and the Projected Subgradient method reported in Table 1 are also obtained by using the modified objective constraint (20), as many images in the two datasets for the evaluation are partially labeled. [sent-295, score-0.224]

94 For the 1-slack StructSVM method, the learned models using the modified objective constraint (20) achieve better performance on the CBCL StreetScenes dataset and worse performance on the MSRC21 dataset. [sent-296, score-0.246]

95 Several examples of the parsing results obtained with the Robust PN models learned by our method are illustrated in Figure 6, with different ways to treat the unlabeled regions in partially labeled training images. [sent-297, score-0.79]

96 Conclusion We present an adaptive subgradient descent method to learn parameters of CRF models for image parsing. [sent-299, score-0.634]

97 In each iteration of the algorithm, the adaptive subgradient descent updating is carried out by solving a simple convex optimization problem which has a subgradient-based solution with an adaptively determined step-size. [sent-300, score-0.865]

98 The adaptively determined updating step-size can well balance the learning efficiency and performance of the learned CRF models. [sent-301, score-0.36]

99 Meanwhile, the proposed method is capable of handling partially labeled training data robustly, with a new objective constraint modeling both the labeled and unlabeled parts in the partially labeled training images for the parameter learning. [sent-302, score-0.934]

100 Scene segmentation with crfs learned from partially labeled images. [sent-469, score-0.366]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('structsvm', 0.516), ('subgradient', 0.481), ('crf', 0.387), ('cbcl', 0.215), ('pn', 0.198), ('unlabeled', 0.16), ('parsing', 0.139), ('streetscenes', 0.129), ('labeled', 0.11), ('partially', 0.102), ('projected', 0.093), ('learned', 0.084), ('updating', 0.081), ('caa', 0.079), ('wt', 0.072), ('descent', 0.07), ('wc', 0.069), ('yn', 0.068), ('discarding', 0.065), ('unary', 0.06), ('yc', 0.06), ('regions', 0.059), ('learning', 0.057), ('adaptively', 0.055), ('constraint', 0.055), ('parameter', 0.055), ('unlabled', 0.053), ('bethe', 0.048), ('proximal', 0.046), ('ga', 0.045), ('training', 0.044), ('adaptive', 0.043), ('yt', 0.042), ('objective', 0.042), ('models', 0.04), ('crfs', 0.039), ('convex', 0.036), ('wp', 0.036), ('labeling', 0.034), ('efficiency', 0.032), ('argmy', 0.032), ('iatedriaetniton', 0.032), ('istevramtio', 0.032), ('itrer', 0.032), ('mateitohonds', 0.032), ('nslack', 0.032), ('pnmodelclasusnifaicryationst', 0.032), ('proejcted', 0.032), ('subgradeint', 0.032), ('trani', 0.032), ('wf', 0.032), ('xc', 0.032), ('xn', 0.032), ('bad', 0.031), ('segmentation', 0.031), ('carried', 0.03), ('robust', 0.03), ('balance', 0.029), ('tierations', 0.029), ('qp', 0.028), ('ways', 0.027), ('assure', 0.026), ('potential', 0.026), ('preference', 0.026), ('yk', 0.026), ('iteration', 0.026), ('potentials', 0.025), ('energy', 0.025), ('treat', 0.025), ('iterations', 0.025), ('stepsize', 0.025), ('inappropriate', 0.025), ('modified', 0.025), ('piecewise', 0.025), ('predefined', 0.024), ('smoothness', 0.024), ('achieved', 0.023), ('meanwhile', 0.023), ('dk', 0.023), ('label', 0.023), ('fc', 0.022), ('bagnell', 0.022), ('determined', 0.022), ('pairwise', 0.021), ('fourth', 0.021), ('fp', 0.021), ('optimization', 0.021), ('boundaries', 0.021), ('msrc', 0.021), ('iteratively', 0.021), ('markov', 0.02), ('wthe', 0.02), ('interactions', 0.02), ('rk', 0.019), ('ladicky', 0.019), ('associative', 0.019), ('kohli', 0.019), ('critical', 0.019), ('wu', 0.018), ('icml', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent

Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan

2 0.13400194 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation

Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool

Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.

3 0.10112041 6 iccv-2013-A Convex Optimization Framework for Active Learning

Author: Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty

Abstract: In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classification performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the fam- ily of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classifier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.

4 0.093423054 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

Author: Dahua Lin, Sanja Fidler, Raquel Urtasun

Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.

5 0.089385763 246 iccv-2013-Learning the Visual Interpretation of Sentences

Author: C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende

Abstract: Sentences that describe visual scenes contain a wide variety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate tuples that contain two nouns and a relation. The relation may take several forms, such as a verb, preposition, adjective or their combination. We model a scene using a Conditional Random Field (CRF) formulation where each node corresponds to an object, and the edges to their relations. We determine the potentials of the CRF using the tuples extracted from the sentences. We generate novel scenes depicting the sentences’ visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task. Our results show we can generate (retrieve) scenes that convey the desired semantic meaning, even when scenes (queries) are described by multiple sentences. Significant improvement is found over several baseline approaches.

6 0.078242116 142 iccv-2013-Ensemble Projection for Semi-supervised Image Classification

7 0.07764747 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling

8 0.072988503 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf

9 0.072525062 428 iccv-2013-Translating Video Content to Natural Language Descriptions

10 0.067285284 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

11 0.066664115 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

12 0.063519627 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

13 0.062653132 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking

14 0.061093383 238 iccv-2013-Learning Graphs to Match

15 0.060827009 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

16 0.060719531 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees

17 0.057174537 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables

18 0.055308338 150 iccv-2013-Exemplar Cut

19 0.054269716 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation

20 0.052361835 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, 0.033), (2, -0.015), (3, -0.023), (4, 0.023), (5, 0.001), (6, -0.05), (7, 0.021), (8, 0.004), (9, -0.074), (10, -0.019), (11, -0.003), (12, -0.04), (13, -0.003), (14, 0.048), (15, 0.004), (16, -0.063), (17, -0.013), (18, -0.05), (19, -0.027), (20, -0.009), (21, -0.027), (22, -0.043), (23, 0.014), (24, 0.022), (25, -0.065), (26, 0.082), (27, 0.001), (28, 0.002), (29, 0.033), (30, -0.02), (31, 0.026), (32, 0.076), (33, -0.026), (34, 0.031), (35, -0.007), (36, -0.082), (37, 0.061), (38, -0.076), (39, 0.035), (40, 0.007), (41, -0.013), (42, 0.018), (43, 0.052), (44, 0.056), (45, 0.034), (46, 0.027), (47, -0.007), (48, -0.022), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91037601 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent

Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan

2 0.80724669 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation

Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool

3 0.6647048 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling

Author: Evgeny Levinkov, Mario Fritz

Abstract: Semantic road labeling is a key component of systems that aim at assisted or even autonomous driving. Considering that such systems continuously operate in the realworld, unforeseen conditions not represented in any conceivable training procedure are likely to occur on a regular basis. In order to equip systems with the ability to cope with such situations, we would like to enable adaptation to such new situations and conditions at runtime. Existing adaptive methods for image labeling either require labeled data from the new condition or even operate globally on a complete test set. None of this is a desirable mode of operation for a system as described above where new images arrive sequentially and conditions may vary. We study the effect of changing test conditions on scene labeling methods based on a new diverse street scene dataset. We propose a novel approach that can operate in such conditions and is based on a sequential Bayesian model update in order to robustly integrate the arriving images into the adapting procedure.

4 0.65592998 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf

Author: empty-author

Abstract: Submodular functions can be exactly minimized in polynomial time, and the special case that graph cuts solve with max flow [19] has had significant impact in computer vision [5, 21, 28]. In this paper we address the important class of sum-of-submodular (SoS) functions [2, 18], which can be efficiently minimized via a variant of max flow called submodular flow [6]. SoS functions can naturally express higher order priors involving, e.g., local image patches; however, it is difficult to fully exploit their expressive power because they have so many parameters. Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set. We adopt a structural SVM approach [15, 34] and formulate the training problem in terms of quadratic programming; as a result we can efficiently search the space of SoS priors via an extended cutting-plane algorithm. We also show how the state-of-the-art max flow method for vision problems [11] can be modified to efficiently solve the submodular flow problem. Experimental comparisons are made against the OpenCVimplementation ofthe GrabCut interactive seg- mentation technique [28], which uses hand-tuned parameters instead of machine learning. On a standard dataset [12] our method learns higher order priors with hundreds of parameter values, and produces significantly better segmentations. While our focus is on binary labeling problems, we show that our techniques can be naturally generalized to handle more than two labels.

5 0.61242205 324 iccv-2013-Potts Model, Parametric Maxflow and K-Submodular Functions

Author: Igor Gridchyn, Vladimir Kolmogorov

Abstract: The problem of minimizing the Potts energy function frequently occurs in computer vision applications. One way to tackle this NP-hard problem was proposed by Kovtun [20, 21]. It identifies a part of an optimal solution by running k maxflow computations, where k is the number of labels. The number of “labeled” pixels can be significant in some applications, e.g. 50-93% in our tests for stereo. We show how to reduce the runtime to O(log k) maxflow computations (or one parametric maxflow computation). Furthermore, the output of our algorithm allows to speed-up the subsequent alpha expansion for the unlabeled part, or can be used as it is for time-critical applications. To derive our technique, we generalize the algorithm of Felzenszwalb et al. [7] for Tree Metrics. We also show a connection to k-submodular functions from combinatorial optimization, and discuss k-submodular relaxations for general energy functions.

6 0.60916859 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints

7 0.59475851 6 iccv-2013-A Convex Optimization Framework for Active Learning

8 0.58322412 191 iccv-2013-Handling Uncertain Tags in Visual Recognition

9 0.58203512 150 iccv-2013-Exemplar Cut

10 0.56624067 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

11 0.56510884 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks

12 0.56163627 246 iccv-2013-Learning the Visual Interpretation of Sentences

13 0.55891132 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps

14 0.55297869 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations

15 0.53964126 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

16 0.53507745 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

17 0.53290373 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling

18 0.53024107 330 iccv-2013-Proportion Priors for Image Sequence Segmentation

19 0.52919304 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

20 0.52466786 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.073), (7, 0.013), (25, 0.243), (26, 0.111), (31, 0.035), (42, 0.1), (48, 0.033), (64, 0.039), (73, 0.038), (78, 0.012), (89, 0.153), (98, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81411374 135 iccv-2013-Efficient Image Dehazing with Boundary Constraint and Contextual Regularization

Author: Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang, Chunhong Pan

Abstract: unkown-abstract

same-paper 2 0.79105765 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent

Author: Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan

3 0.78720349 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks

Author: Mojtaba Seyedhosseini, Mehdi Sajjadi, Tolga Tasdizen

Abstract: Contextual information plays an important role in solving vision problems such as image segmentation. However, extracting contextual information and using it in an effective way remains a difficult problem. To address this challenge, we propose a multi-resolution contextual framework, called cascaded hierarchical model (CHM), which learns contextual information in a hierarchical framework for image segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. We repeat this procedure by cascading the hierarchical framework to improve the segmentation accuracy. Multiple classifiers are learned in the CHM; therefore, a fast and accurate classifier is required to make the training tractable. The classifier also needs to be robust against overfitting due to the large number of parameters learned during training. We introduce a novel classification scheme, called logistic dis- junctive normal networks (LDNN), which consists of one adaptive layer of feature detectors implemented by logistic sigmoid functions followed by two fixed layers of logical units that compute conjunctions and disjunctions, respectively. We demonstrate that LDNN outperforms state-of-theart classifiers and can be used in the CHM to improve object segmentation performance.

4 0.77206385 307 iccv-2013-Parallel Transport of Deformations in Shape Space of Elastic Surfaces

Author: Qian Xie, Sebastian Kurtek, Huiling Le, Anuj Srivastava

Abstract: Statistical shape analysis develops methods for comparisons, deformations, summarizations, and modeling of shapes in given data sets. These tasks require afundamental tool called parallel transport of tangent vectors along arbitrary paths. This tool is essential for: (1) computation of geodesic paths using either shooting or path-straightening method, (2) transferring deformations across objects, and (3) modeling of statistical variability in shapes. Using the square-root normal field (SRNF) representation of parameterized surfaces, we present a method for transporting deformations along paths in the shape space. This is difficult despite the underlying space being a vector space because the chosen (elastic) Riemannian metric is non-standard. Using a finite-basis for representing SRNFs of shapes, we derive expressions for Christoffel symbols that enable parallel transports. We demonstrate this framework using examples from shape analysis of parameterized spherical sur- faces, in the three contexts mentioned above.

5 0.76607597 30 iccv-2013-A Simple Model for Intrinsic Image Decomposition with Depth Cues

Author: Qifeng Chen, Vladlen Koltun

Abstract: We present a model for intrinsic decomposition of RGB-D images. Our approach analyzes a single RGB-D image and estimates albedo and shading fields that explain the input. To disambiguate the problem, our model estimates a number of components that jointly account for the reconstructed shading. By decomposing the shading field, we can build in assumptions about image formation that help distinguish reflectance variation from shading. These assumptions are expressed as simple nonlocal regularizers. We evaluate the model on real-world images and on a challenging synthetic dataset. The experimental results demonstrate that the presented approach outperforms prior models for intrinsic decomposition of RGB-D images.

6 0.7270844 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

7 0.69842231 312 iccv-2013-Perceptual Fidelity Aware Mean Squared Error

8 0.68300867 150 iccv-2013-Exemplar Cut

9 0.67935109 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

10 0.67840904 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions

11 0.67732006 414 iccv-2013-Temporally Consistent Superpixels

12 0.67702782 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning

13 0.6739549 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation

14 0.6720826 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

15 0.6718598 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

16 0.6717273 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling

17 0.67150223 6 iccv-2013-A Convex Optimization Framework for Active Learning

18 0.67122263 330 iccv-2013-Proportion Priors for Image Sequence Segmentation

19 0.66987664 20 iccv-2013-A Max-Margin Perspective on Sparse Representation-Based Classification

20 0.66983318 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition