iccv iccv2013 iccv2013-75 knowledge-graph by maker-knowledge-mining

75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework


Source: pdf

Author: Jianping Shi, Renjie Liao, Jiaya Jia

Abstract: We propose a co-detection and labeling (CoDeL) framework to identify persons that contain self-consistent appearance in multiple images. Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. Our CoDeL model achieves decent performance on existing and new benchmark datasets.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. [sent-5, score-0.612]

2 Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. [sent-6, score-0.575]

3 Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. [sent-7, score-0.509]

4 Current commercial systems, such as Picasa or facebook, have already provided the human grouping function based on face similarity. [sent-22, score-0.333]

5 Previous human identity grouping research [27, 20, 1] extends faces to torsos, given the fact that a person appearing in multiple images taken in the same day or during the same event often wears the same clothes. [sent-24, score-0.363]

6 Since face detector is vulnerable to headpose variation, not to mention occlusion or back views. [sent-25, score-0.233]

7 In this regard, a reliable human detector would be vastly valuable to the community. [sent-29, score-0.358]

8 For most classical single-image human detectors [5, 29, 6], input images generally contain pedestrians in standing or walking poses. [sent-30, score-0.271]

9 Our method relaxes this latent constraint in detecting and grouping persons, thus working on data that could fail conventional human detectors and human template matching. [sent-31, score-0.57]

10 Moreover, the possible high variation of backgrounds in different images would make the human template matching really challenging. [sent-32, score-0.36]

11 To efficiently utilize the human co-occurrence information in multiple images, we develop a human co-detection 2096 and labeling (CoDeL) framework. [sent-38, score-0.479]

12 It is in a semi-supervised learning manner, since the number of manually annotated regions is limited [8] and most existing human detection datasets [6] seldom offer labeling about whether two detected persons actually correspond to the same one. [sent-39, score-0.425]

13 It trains two classifiers, including a detector and a matching classifier, based on two feature sets, which are conditionally independent given the class labels. [sent-41, score-0.35]

14 In particular, for the detection part, we resort to the deformable part-based model [10, 8], which exploits edge information, like HOG [5], to distinguish human from background. [sent-44, score-0.31]

15 Part-based model represents a human hypothesis as multiple flexible parts. [sent-45, score-0.302]

16 It reduces background noise and is also robust to deformation of human body, which often occurs. [sent-46, score-0.247]

17 For the matching process, we build a potential function through a Gaussian process [15]. [sent-47, score-0.218]

18 It not only incorporates the similarity between any two parts [2] but also measures the similarity between two human hypotheses, thus being robust to partial appearance variation. [sent-48, score-0.273]

19 From another point of view, the two feature sets used by the two classifiers are conditionally independent given human labels. [sent-50, score-0.322]

20 With the initial annotated human regions and labeled matching region pairs, we train the part-based human detector and matching classifier respectively. [sent-51, score-1.095]

21 We regard the positive outputs of one classifier as the weak positive samples of the other, and iterate this process until reliable classifiers are yielded. [sent-52, score-0.396]

22 In testing, given the trained detector and matching classifier, we apply CoDeL to detect and label human regions. [sent-53, score-0.507]

23 Second, we design a new matching classifier to capture occurrence of the same person. [sent-56, score-0.269]

24 Related Work Previous work for human identity grouping [27, 20, 1] usually extracts visual features from face regions and clothes. [sent-60, score-0.362]

25 Performance of the face detector is important in these methods. [sent-61, score-0.233]

26 Another stream of human identity identification [18, 21] is to handle videos via trackers. [sent-63, score-0.273]

27 [9] matches human in crowd images given the user input as initial label to retrieve under a small-motion assumption. [sent-65, score-0.265]

28 It applies a pictorial structure model on human parts (hair, face and torso), which can be regarded as a special version in our general framework. [sent-69, score-0.297]

29 Most previous human detectors concentrate on pedestrians, where sliding windows are adopted. [sent-77, score-0.247]

30 For general human bodies with large deformation, object detector trained on human datasets performs better. [sent-78, score-0.569]

31 Co-Detection and Labeling Given a general human detection training set and an additional small set with matching labels, we aim to build a human co-detection and labeling (CoDeL) solution. [sent-89, score-0.701]

32 , S1]i acned f tahcee technique flloyr detecting faces [24] is mature, we add the face filter f as an additional constraint for human hypothesis. [sent-98, score-0.38]

33 As long as the ratio of overlapping area between human bounding box 2097 and face exceeds a predefined threshold (set to 0. [sent-99, score-0.349]

34 5 in our experiments), face and human are grouped together. [sent-100, score-0.297]

35 Energy Function for CoDeL The goal of our CoDeL model is to incorporate the human detecting and matching classifiers in the same framework, so that the two classifiers could help improve each other by adding weak positive samples according to their classification results. [sent-104, score-0.618]

36 , HramMe}w, aornkd give pair-wise matching scores v Hia =the { matching cla}s,s iafinedr. [sent-113, score-0.336]

37 n+ 1 (1) where Hi is the ith human hypothesis in H. [sent-127, score-0.302]

38 The restriction Hi ∈ In confirms tuhmata nH hi piso dtheetesciste ind Hwi. [sent-128, score-0.236]

39 Hl i∈s t hIe set of all human hypotheses in image Il. [sent-130, score-0.427]

40 (1) is the unary potential term, which measures the compatibility between human hypothesis Hi and observed image In. [sent-132, score-0.488]

41 Em is the matching potential term, measuring pairwise similarity between Hi in image In and human hypotheses set Hl in image Il. [sent-133, score-0.676]

42 Specifically, Hthe unary potential Eu for human hypothesis Hi in image In, which is the detection classifier in our CoDeL model, is defined as Eu(Hi, In) = Ef(fi, In) + Eh(ri, Pi, In), (2) where Ef is the potential which indicates the likelihood of containing a face in the area. [sent-134, score-0.808]

43 ctioexn pd{e−finyed on the )f}ac+e region as gt(hef sum over weighted outputs of weak classifiers, wf is its parameter set, and yi is the classifier label in this region. [sent-137, score-0.239]

44 Eh measures the compatibility between image In and partbased human hypothesis Hi represented by {ri, Pi}. [sent-139, score-0.328]

45 (1) defines the human hypothesis level matching potential between Hi in image In and the set of hypotheses Hl in Il. [sent-152, score-0.736]

46 We model matching as Em(Hi,In,Hl,Il) = T (Hmja∈IxlEˆm(Hi,Hj),t), (4) where T (x, t) is a threshold function to measure the similarity eb Tetw (xe,et)n iHs ia a thndre tshheo lbde sfut nmcatitochne tod h muemaasnur hypothesis Hj in image Il. [sent-154, score-0.323]

47 It can avoid establishing excessive or incorrect matching linkage between any two human pairs. [sent-158, score-0.36]

48 Eˆm (Hi , Hj) reports the similarity between two human hypotheses Hi and Hj . [sent-159, score-0.458]

49 To describe the marginal likeliahroeo mda explicitly, we introduce a latent function and transform the matching value to obtain a valid probability measure as λ p(yij = 1|Hi, Hj ) = σ(λ(Hi, Hj)), (6) where σ is a logistic function. [sent-162, score-0.257]

50 In particular, the input of λ is defined as the difference between two stacked feature vectors extracted from parts of human hypotheses respectively. [sent-164, score-0.456]

51 The correspondences of parts for two human hypotheses are obtained similarly as in the part-base model [10]. [sent-165, score-0.427]

52 (1), the unary potential corresponds to a classifier based on face and part-based human detectors, which largely rely on edge information. [sent-169, score-0.577]

53 The matching potential, differently, contains a classifier taking part-level similarity scores measured as difference upon color and texture features. [sent-170, score-0.405]

54 colortexture) are conditionally independent given human labels, since edges are used to distinguish between human and nonhuman while color and texture are responsible for measuring similarity of two human hypotheses. [sent-172, score-0.785]

55 When the labeled training data are not enough, we can use positive samples produced by one classifier as weak positive ones for updating the other. [sent-174, score-0.364]

56 Model Learning In model learning, given a group of training data with part of them containing detected bounding boxes and a subset with labeled human correspondences, we learn the parameters for Eq. [sent-180, score-0.337]

57 (1), we first train the initial classifiers with labeled training data, and then update the unary and matching terms iteratively in a co-training manner by exploring unlabeled training data. [sent-185, score-0.553]

58 Since frontal faces do not often appear in our dataset, we first obtain the parameter wf in the face detector through training Ada boost [24] on common face dataset and fix it in remaining iterations. [sent-187, score-0.454]

59 Based on the above two updating steps, we perform cotraining to generate new weak labeled positive samples. [sent-212, score-0.221]

60 We first learn two initial classifiers and use the initial trained human detector to test new unlabeled images. [sent-215, score-0.511]

61 Since output of the detector contains no label, they cannot be directly employed by the successive matching classifier. [sent-216, score-0.296]

62 To overcome this problem, we build a confidence criterion based on the probabilistic property of the GP classifier, which lets the mean prediction of the GP learned in last round determine whether two new hypotheses produced by the detector match. [sent-217, score-0.465]

63 The mean prediction of the two human hypotheses Hi∗ and Hj∗ in the 2099 GP classifier is defined as ¯y∗ij=? [sent-218, score-0.547]

64 p(yij|λ∗)p(λ∗|H,Hi∗,Hj∗)dλ∗ (10) where λ∗ is the current latent function corresponding to the test pair and H is the initial training human hypotheses set. [sent-219, score-0.526]

65 two human hypotheses are denoted as weak positive and are added to the training data of matching in the next round. [sent-222, score-0.718]

66 Since these input hypotheses are selected from output of the detector with high unary scores, we pass these hypotheses pairs to retrain the matching classifier, illustrated in Fig. [sent-226, score-0.898]

67 Given the updated matching classifier, we retrieve weak positive human hypotheses to train the detector, shown in the bottom row in Fig. [sent-228, score-0.737]

68 First, a base hypotheses pool is generated by a human detector with a low unary score threshold, thus with high recall. [sent-230, score-0.665]

69 For each pair of hypotheses in this base pool, we calculate the total energy in Eq. [sent-231, score-0.253]

70 Since the total energy indicates the confidence of a human region, we retrain our detector with data remaining in this complete hypothesis pool. [sent-233, score-0.587]

71 Model Inference In model inference, our goal is to detect human hypotheses and report their corresponding labels on new data given the human detector and matching classifier. [sent-239, score-0.934]

72 We use the face and part-based human detectors to find candidates. [sent-240, score-0.333]

73 Then we adopt the GP classifier on each pair of human hypotheses to get the matching score via Eq. [sent-242, score-0.696]

74 If the total score for a particular human body in Eq. [sent-244, score-0.241]

75 Therefore, the final confidence includes both the unary and matching scores. [sent-247, score-0.293]

76 If a human region finds similar ones in other images, which are also labeled as human, it becomes more confident. [sent-248, score-0.27]

77 Meanwhile, the detected false-alarm regions have relatively low unary scores and could hardly find matches among other human regions. [sent-249, score-0.401]

78 human regions and their pairwise matching potential scores, we assign the human labels via hierarchical clustering [22], which assures the difference of scores within a cluster is less than a predefined distance. [sent-256, score-0.678]

79 3) Can our matching classifier correctly distinguish between matched pairs and others? [sent-262, score-0.352]

80 One is the pedestrian dataset provided in [7] where the stereo image pairs serve as a natural source of matched pairs following the setting of [2]. [sent-267, score-0.217]

81 It conforms to the assumptions of our human co-detection tasks that is, each human appears in only part of the image set with consistent appearance. [sent-269, score-0.422]

82 The matching classifier is evaluated by the classification accuracy with respect to ground truth matching labels. [sent-274, score-0.418]

83 In the matching part, we use three sets of features to describe similarity in terms of color and texture as in Eq. [sent-288, score-0.247]

84 Each feature represents a human hypothesis via a stacked vector among all parts. [sent-291, score-0.331]

85 With increase of unlabeled training data, our co-training system gradually enriches the training set by adding weak positive samples and improves the detection performance. [sent-303, score-0.261]

86 (10), which is used to generate weak labeled human pairs to retrain the matching classifier. [sent-307, score-0.563]

87 As the threshold goes up, the error rate drops, pedestrian dataset [7] and our human co-detection (HCD) dataset. [sent-316, score-0.366]

88 Features of matched SIFT yield good results on the pedestrian dataset because the scales of most persons are small and the majority of them are with dark clothes. [sent-324, score-0.24]

89 Other challenges introduced by this dataset include matching pair chosen among all images and background noise caused by deformable body parts. [sent-326, score-0.215]

90 Also the GP classifier is consistently better than the linear SVM classifier used in [2] due to its non-liner property. [sent-329, score-0.24]

91 Co-Detection Results We compare our method with the widely used face detector [24], one of state-of-the-art human detectors [8] and object co-detection method [2]. [sent-332, score-0.48]

92 The face detector cannot deal with the situation that the face is partly or completely missing. [sent-334, score-0.319]

93 We do not evaluate this detector on the pedestrian dataset, since faces can hardly be found. [sent-336, score-0.325]

94 Compared to single-image human detector [8], our matching classifier can increase the score of unreliable human hypotheses when they have confident matches. [sent-337, score-1.054]

95 Further, our CoDeL framework yields a larger increase on the HCD dataset than that on the pedestrian one, since HCD provides more images with potential matching pairs. [sent-341, score-0.321]

96 It is notable that errors may occur when two different human hypotheses are of quite similar appearances, as illustrated in the first row of Fig. [sent-353, score-0.427]

97 Conclusion and Future Work We have proposed a human co-detection and labeling (CoDeL) framework. [sent-360, score-0.268]

98 Also we define our matching classifier via a Gaussian process on the human hypothesis level. [sent-362, score-0.571]

99 Our future work includes extending the matching classifier by integrating spatial relationship among parts. [sent-364, score-0.269]

100 Fast human detection using a cascade of histograms of oriented gradients. [sent-567, score-0.25]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('codel', 0.514), ('hcd', 0.352), ('hj', 0.287), ('hi', 0.236), ('hypotheses', 0.216), ('human', 0.211), ('gp', 0.171), ('matching', 0.149), ('detector', 0.147), ('yij', 0.129), ('classifier', 0.12), ('pedestrian', 0.103), ('unary', 0.091), ('hypothesis', 0.091), ('face', 0.086), ('persons', 0.085), ('potential', 0.069), ('weak', 0.065), ('labeled', 0.059), ('classifiers', 0.057), ('logp', 0.057), ('labeling', 0.057), ('eu', 0.055), ('cotraining', 0.054), ('conditionally', 0.054), ('wf', 0.054), ('confidence', 0.053), ('pages', 0.053), ('threshold', 0.052), ('matched', 0.052), ('round', 0.049), ('retrain', 0.048), ('knock', 0.048), ('faces', 0.047), ('unlabeled', 0.046), ('ef', 0.046), ('wp', 0.045), ('hl', 0.044), ('wr', 0.044), ('wc', 0.044), ('positive', 0.043), ('latent', 0.04), ('person', 0.04), ('detection', 0.039), ('eh', 0.038), ('scores', 0.038), ('iterate', 0.037), ('star', 0.037), ('hthe', 0.037), ('garg', 0.037), ('energy', 0.037), ('em', 0.037), ('texture', 0.037), ('deformation', 0.036), ('grouping', 0.036), ('detecting', 0.036), ('detectors', 0.036), ('deformable', 0.036), ('misclassification', 0.036), ('logistic', 0.035), ('training', 0.034), ('manner', 0.034), ('marginal', 0.033), ('detected', 0.033), ('identification', 0.033), ('ep', 0.033), ('root', 0.032), ('likelihood', 0.032), ('regard', 0.031), ('pairs', 0.031), ('similarity', 0.031), ('body', 0.03), ('scattered', 0.03), ('color', 0.03), ('ec', 0.03), ('implicit', 0.029), ('indication', 0.029), ('rejection', 0.029), ('retrieve', 0.029), ('identity', 0.029), ('stacked', 0.029), ('bao', 0.029), ('contributed', 0.029), ('hardly', 0.028), ('sivic', 0.027), ('compatibility', 0.026), ('meanwhile', 0.026), ('doll', 0.026), ('criteria', 0.025), ('initial', 0.025), ('hough', 0.025), ('personal', 0.025), ('option', 0.025), ('ri', 0.024), ('resort', 0.024), ('train', 0.024), ('pedestrians', 0.024), ('insufficiently', 0.024), ('jianping', 0.024), ('hinu', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework

Author: Jianping Shi, Renjie Liao, Jiaya Jia

Abstract: We propose a co-detection and labeling (CoDeL) framework to identify persons that contain self-consistent appearance in multiple images. Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. Our CoDeL model achieves decent performance on existing and new benchmark datasets.

2 0.1549442 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables

Author: Daozheng Chen, Dhruv Batra, William T. Freeman

Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.

3 0.15117839 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

Author: Ryan Tokola, Wongun Choi, Silvio Savarese

Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).

4 0.14543813 190 iccv-2013-Handling Occlusions with Franken-Classifiers

Author: Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool

Abstract: Detecting partially occluded pedestrians is challenging. A common practice to maximize detection quality is to train a set of occlusion-specific classifiers, each for a certain amount and type of occlusion. Since training classifiers is expensive, only a handful are typically trained. We show that by using many occlusion-specific classifiers, we outperform previous approaches on three pedestrian datasets; INRIA, ETH, and Caltech USA. We present a new approach to train such classifiers. By reusing computations among different training stages, 16 occlusion-specific classifiers can be trained at only one tenth the cost of one full training. We show that also test time cost grows sub-linearly.

5 0.12786604 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection

Author: Xingyu Zeng, Wanli Ouyang, Xiaogang Wang

Abstract: Cascaded classifiers1 have been widely used in pedestrian detection and achieved great success. These classifiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of backpropagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a specific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classifier handles samples at a different difficulty level. Unsupervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental results show that the training strategy helps to avoid overfitting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches.

6 0.11064117 6 iccv-2013-A Convex Optimization Framework for Active Learning

7 0.11062456 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

8 0.10998522 338 iccv-2013-Randomized Ensemble Tracking

9 0.10686328 157 iccv-2013-Fast Face Detector Training Using Tailored Views

10 0.10593833 379 iccv-2013-Semantic Segmentation without Annotating Segments

11 0.10447616 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

12 0.10428888 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes

13 0.10404132 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

14 0.099345595 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

15 0.097172022 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary

16 0.095029846 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition

17 0.093851879 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors

18 0.092005081 46 iccv-2013-Allocentric Pose Estimation

19 0.091742821 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation

20 0.090928435 150 iccv-2013-Exemplar Cut


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.231), (1, 0.036), (2, -0.018), (3, -0.047), (4, 0.098), (5, -0.067), (6, 0.025), (7, 0.101), (8, -0.034), (9, -0.065), (10, 0.006), (11, 0.005), (12, -0.016), (13, -0.059), (14, 0.045), (15, 0.003), (16, -0.022), (17, 0.05), (18, 0.027), (19, 0.053), (20, -0.066), (21, -0.014), (22, -0.014), (23, -0.02), (24, 0.042), (25, -0.033), (26, -0.027), (27, 0.009), (28, -0.006), (29, -0.042), (30, -0.071), (31, 0.044), (32, 0.042), (33, -0.021), (34, 0.067), (35, -0.085), (36, 0.062), (37, 0.024), (38, -0.018), (39, 0.014), (40, 0.077), (41, 0.079), (42, 0.033), (43, 0.042), (44, 0.045), (45, 0.009), (46, 0.039), (47, 0.008), (48, -0.007), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95581973 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework

Author: Jianping Shi, Renjie Liao, Jiaya Jia

Abstract: We propose a co-detection and labeling (CoDeL) framework to identify persons that contain self-consistent appearance in multiple images. Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. Our CoDeL model achieves decent performance on existing and new benchmark datasets.

2 0.78088045 136 iccv-2013-Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve

Author: Sakrapee Paisitkriangkrai, Chunhua Shen, Anton Van Den Hengel

Abstract: Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). Effective cascade-based classification, for example, depends on training node classifiers that achieve the maximal detection rate at a moderate false positive rate, e.g., around 40% to 50%. We propose a novel ensemble learning method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. By optimizing for different ranges of false positive rates, the proposed method can be used to train either a single strong classifier or a node classifier forming part of a cascade classifier. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the pro- posed structured ensemble learning method.

3 0.74119687 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection

Author: Tianfu Wu, Song-Chun Zhu

Abstract: Many object detectors, such as AdaBoost, SVM and deformable part-based models (DPM), compute additive scoring functions at a large number of windows scanned over image pyramid, thus computational efficiency is an important consideration beside accuracy performance. In this paper, we present a framework of learning cost-sensitive decision policy which is a sequence of two-sided thresholds to execute early rejection or early acceptance based on the accumulative scores at each step. A decision policy is said to be optimal if it minimizes an empirical global risk function that sums over the loss of false negatives (FN) and false positives (FP), and the cost of computation. While the risk function is very complex due to high-order connections among the two-sided thresholds, we find its upper bound can be optimized by dynamic programming (DP) efficiently and thus say the learned policy is near-optimal. Given the loss of FN and FP and the cost in three numbers, our method can produce a policy on-the-fly for Adaboost, SVM and DPM. In experiments, we show that our decision policy outperforms state-of-the-art cascade methods significantly in terms of speed with similar accuracy performance.

4 0.73966515 349 iccv-2013-Regionlets for Generic Object Detection

Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin

Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.

5 0.73748541 190 iccv-2013-Handling Occlusions with Franken-Classifiers

Author: Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool

Abstract: Detecting partially occluded pedestrians is challenging. A common practice to maximize detection quality is to train a set of occlusion-specific classifiers, each for a certain amount and type of occlusion. Since training classifiers is expensive, only a handful are typically trained. We show that by using many occlusion-specific classifiers, we outperform previous approaches on three pedestrian datasets; INRIA, ETH, and Caltech USA. We present a new approach to train such classifiers. By reusing computations among different training stages, 16 occlusion-specific classifiers can be trained at only one tenth the cost of one full training. We show that also test time cost grows sub-linearly.

6 0.70774245 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

7 0.69149983 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

8 0.69098848 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection

9 0.67982036 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks

10 0.6759544 44 iccv-2013-Adapting Classification Cascades to New Domains

11 0.66708016 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

12 0.66450626 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

13 0.65777469 189 iccv-2013-HOGgles: Visualizing Object Detection Features

14 0.65127605 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

15 0.64980137 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

16 0.64437485 130 iccv-2013-Dynamic Structured Model Selection

17 0.63344723 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

18 0.63313168 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

19 0.62091267 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

20 0.61951125 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.092), (7, 0.041), (12, 0.014), (22, 0.127), (26, 0.077), (27, 0.011), (31, 0.051), (40, 0.012), (42, 0.137), (48, 0.026), (64, 0.059), (73, 0.039), (89, 0.192), (95, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94925725 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

2 0.94166368 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Author: Taehwan Kim, Greg Shakhnarovich, Karen Livescu

Abstract: Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain ’s “grammar”. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of fingerspelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of fingerspelled words; here we study the more natural open-vocabulary case, where the only domain knowledge is the possible fingerspelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic handshape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% us- ing the proposed semi-Markov model.

3 0.94002157 49 iccv-2013-An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points

Author: Lilian Calvet, Pierre Gurdjos

Abstract: This work aims at introducing a new unified Structurefrom-Motion (SfM) paradigm in which images of circular point-pairs can be combined with images of natural points. An imaged circular point-pair encodes the 2D Euclidean structure of a world plane and can easily be derived from the image of a planar shape, especially those including circles. A classical SfM method generally runs two steps: first a projective factorization of all matched image points (into projective cameras and points) and second a camera selfcalibration that updates the obtained world from projective to Euclidean. This work shows how to introduce images of circular points in these two SfM steps while its key contribution is to provide the theoretical foundations for combining “classical” linear self-calibration constraints with additional ones derived from such images. We show that the two proposed SfM steps clearly contribute to better results than the classical approach. We validate our contributions on synthetic and real images.

same-paper 4 0.91329277 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework

Author: Jianping Shi, Renjie Liao, Jiaya Jia

Abstract: We propose a co-detection and labeling (CoDeL) framework to identify persons that contain self-consistent appearance in multiple images. Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. Our CoDeL model achieves decent performance on existing and new benchmark datasets.

5 0.89253753 180 iccv-2013-From Where and How to What We See

Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath

Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.

6 0.89215308 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps

7 0.89171934 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

8 0.89039755 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

9 0.89000988 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition

10 0.88958901 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

11 0.88913119 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps

12 0.88911587 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

13 0.88897443 206 iccv-2013-Hybrid Deep Learning for Face Verification

14 0.88876772 349 iccv-2013-Regionlets for Generic Object Detection

15 0.88851035 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

16 0.88802099 277 iccv-2013-Multi-channel Correlation Filters

17 0.88779044 338 iccv-2013-Randomized Ensemble Tracking

18 0.88702059 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

19 0.88674617 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

20 0.88593936 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition