cvpr cvpr2013 cvpr2013-271 knowledge-graph by maker-knowledge-mining

271 cvpr-2013-Locally Aligned Feature Transforms across Views


Source: pdf

Author: Wei Li, Xiaogang Wang

Abstract: In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsityinducing norm and information theoretical regularization. . cuhk . edu .hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. Extensive experiments are conducted on public datasets and our own dataset. Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. [sent-2, score-0.755]

2 It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. [sent-3, score-0.565]

3 The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. [sent-4, score-0.742]

4 hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. [sent-10, score-0.364]

5 Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach. [sent-12, score-0.333]

6 Introduction Person re-identification is to match the snapshots of pedestrians observed in non-overlapping camera views with visual features. [sent-14, score-0.481]

7 However, this problem is extremely challenging, because it is difficult to match the visual features of pedestrians captured in different camera views due to the large variations of lightings, poses, viewpoints, image resolutions, photometric settings of cameras, and backgrounds. [sent-16, score-0.547]

8 Accurate human parsing[18] will benefit person re-identification, but it is a hard problems to solve. [sent-17, score-0.168]

9 Existing works solve this challenge in two possible ways: (1) learning the photometric or geometric transforms between two camera views, if the photometric/geometric Figure 1. [sent-18, score-0.436]

10 Examples of pedestrians captured in two camera views in the VIPeR dataset[10]. [sent-19, score-0.446]

11 Images have different poses, lightings and background even if they are captured in the same camera view. [sent-21, score-0.254]

12 models can be assumed [24]; (2) learning a distance metric or projecting visual features from different views into a common feature space for matching in order to suppress inter-camera variations. [sent-22, score-0.47]

13 The approaches from both categories assume two fixed camera views with a uni-model inter-camera transform and labeled training samples from the two views are available. [sent-23, score-0.767]

14 However, in practice the configurations (which are the combinations of view points, poses, image resolutions, lightings and photometric settings) of pedestrian images are multi-modal even if they are observed in the same camera views. [sent-24, score-0.426]

15 Moreover, given a large camera network in video surveillance, it is impossible to label training samples for every pair of camera views. [sent-27, score-0.602]

16 It is highly desirable to develop an algorithm which can match images from two new camera views given training samples collected from other camera views. [sent-28, score-0.745]

17 We propose a new approach of learning locally aligned feature transforms across multiple views and apply it to person re-identification. [sent-29, score-0.776]

18 The image spaces of two camera views are jointly partitioned based on the similarity of cross-view transforms. [sent-32, score-0.498]

19 Sample pairs with similar transforms are projected to a common feature space for matching. [sent-33, score-0.309]

20 (1) As illustrated in Figure 2, the proposed approach automatically partitions the image spaces of two camera views into subregions which correspond to different configurations, and learns a different feature transform for a pair of configurations. [sent-35, score-0.588]

21 Given a pair of images to be matched, they are softly assigned to configuration types with a gating network and their visual features are projected to a common feature space and matched by a local expert. [sent-36, score-0.611]

22 (3) The image spaces of the two camera views are jointly partitioned instead of separately, to avoid some combinations of configurations rarely appearing in the two views. [sent-40, score-0.535]

23 The local experts of these rare combinations cannot be well learned given few, if any, training samples. [sent-41, score-0.267]

24 (4) Besides suppressing cross-view variations, the discriminative power of local experts is further increased by locally magnifying inter-person variations. [sent-42, score-0.261]

25 (5) This approach is extended to the case when test images are from new camera views not existing in the training set. [sent-43, score-0.47]

26 Related Work Metric learning and feature selection have been widely used to reduce cross-view variations and to increase the discriminative power in person re-identification. [sent-46, score-0.325]

27 Some approaches [26, 23] assume that all the persons to be identified have samples in the training set. [sent-47, score-0.312]

28 Lin and Davis [23] assumed that a feature optimal for distinguishing a pair of persons might not be effective for others, and therefore learned the dissimilarity profiles under a pairwise scheme. [sent-49, score-0.341]

29 In order to identify persons outside the training set, Zheng et al. [sent-50, score-0.245]

30 [32] formulated person re-identification as a distance learning problem by maximizing the probability that a pair of true match has a smaller distance than a wrong match. [sent-51, score-0.303]

31 learned a metric specially designed for identification tasks under pairwise constraints and further kernelized it to overcome the linearity. [sent-54, score-0.235]

32 In [11, 25] boosting and RankSVM were used to select features to compute the distance between images observed across camera views. [sent-55, score-0.173]

33 proposed a transferred metric learning framework for learning specific metric for different query-candidate combinations. [sent-57, score-0.33]

34 Although not being widely applied to person re-identification yet, Canonical Correlation Analysis (CCA)[14] has been used to match data from different views or in different modalities in the applications of face recognition [21, 33] and image-to-text matching [13]. [sent-59, score-0.433]

35 All the approaches discussed above assume a single global model or a generic metric, which cannot well handle multiple types of transforms between two views. [sent-60, score-0.184]

36 It is also hard for these learning-based approaches to be generalized to new views without re-labeling training data. [sent-61, score-0.297]

37 [28] extended their metric learning framework named Large Margin Nearest Neighbor (LMNN) to learn multiple localized metrics for different image clusters. [sent-65, score-0.246]

38 In mixture of experts [15], a gating network classified test samples into different clusters and samples within one cluster were classified with the same local expert. [sent-66, score-0.605]

39 [30] learned a different metric for each training sample. [sent-70, score-0.219]

40 In [6, 7], each training sample had a different metric and all the metrics were aligned with global constraints. [sent-71, score-0.275]

41 Differently, we jointly partition the image spaces of two views based on the similarity of cross-view transform. [sent-74, score-0.325]

42 Since the possible transforms is much less than the total visual diversity, it leads to a smaller number of local experts which can be well learned 333555999533 Figure 3. [sent-75, score-0.384]

43 Our approach automatically separates the two types of features with a proposed sparse gating network. [sent-81, score-0.236]

44 (x, y) ∈ Rm Rm are the visual feature vectors of a pair o ∈f images oRbserved in two camera views. [sent-84, score-0.265]

45 A pair of test samples to be matched are input to a gating network to choose local experts in a soft way, and matched with the selected experts. [sent-93, score-0.753]

46 One way of designing the gating network by following traditional approaches with a single image space is to partition each of the two image spaces separately into K regions, and then learn K2 experts for all the combinations. [sent-96, score-0.534]

47 The gating functions in two image spaces are independent, i. [sent-97, score-0.299]

48 Since some configurations ( tosx , sy) rarely cboe-rex oifs te xinboth views, not enough training samples can be found to train the experts. [sent-101, score-0.171]

49 Instead, we assume the two image samples are correlated and compute the gating function as1 p(s = k|x,y) =? [sent-102, score-0.303]

50 Priors The gating function parameters {φk , ψk}kK=1 and expert parameters n{gW fukn , cVtiokn}K pk=ar1a are etros b {eφ learne}d from training pdaartaa. [sent-121, score-0.368]

51 Objective function The objective function on the training set {(xi, yi)}iN=1, wheTrhee ( oxbi,je eycti)iv ies a pair nof o samples nwinithg tsehet same identity but observed in different views, is written as following ? [sent-149, score-0.243]

52 Multi-Shot extension All the descriptions so far assume single-shot person reidentification, i. [sent-231, score-0.168]

53 for a query sample xi in view A, there is only one sample yi with the same identity in the gallery of view B. [sent-233, score-0.46]

54 Multi-shot person re-identification occurs when there are more than one samples Yi with the same identity as xi airne ev mieowr eB t. [sent-234, score-0.291]

55 333555999755 Our multi-shot extension makes training easier, since it does not have to match every training pair when learning the cross-view transforms. [sent-249, score-0.269]

56 It only needs to minimize the distance ofbest matched pairs and effectively reduces the number of cross-view transforms in consideration. [sent-250, score-0.304]

57 Discriminative metric learning The proposed locally aligned feature transform only reduces the cross-view variations without considering how to discriminate different persons. [sent-253, score-0.346]

58 Ti is the set of top ranked samples in view B with xi as query and under the distance | |Wkx Vky| |22. [sent-275, score-0.189]

59 50 persons appear in both views and 22 persons appear only in one view. [sent-289, score-0.586]

60 These two datasets are used to evaluate person re-identification given two fixed camera views. [sent-291, score-0.341]

61 CUHK02 contains 1, 816 persons and five pairs of camera views (P1-P5, ten camera views). [sent-292, score-0.793]

62 They have 971, 306, 107, 193 and 239 persons respectively. [sent-293, score-0.178]

63 This dataset is used to evaluate the performance when camera views in test are different than those in training. [sent-295, score-0.403]

64 (a) The five columns on the left show five images of the same person in each of the two camera views. [sent-304, score-0.341]

65 Only two images in different views are shown for each person. [sent-306, score-0.23]

66 (b) CUHK02 has five pairs of camera views denoted with P1-P5. [sent-308, score-0.442]

67 Two exemplar persons are shown for each pair of views. [sent-309, score-0.231]

68 Many state-of-the-art person re-identification methods have published their results on VIPeR and CAVIER. [sent-313, score-0.239]

69 The dimensionality of projected common feature space in local experts (i. [sent-326, score-0.252]

70 Identification with two fixed camera views It is assumed that all the training and test samples come from the same pair of camera views. [sent-332, score-0.763]

71 Singleshot assumes each person has one image in the gallery, while multi-shot assumes M gallery images per person. [sent-335, score-0.339]

72 Two protocols on VIPeR were used in the past: randomly splitting the whole dataset into 316 persons for training and the remaining 316 for test; and randomly splitting into 100 persons for training and 532 for test. [sent-340, score-0.522]

73 If a person has images in both camera views, we randomly select two pairs of images in different views for training. [sent-349, score-0.61]

74 One query image and one gallery image are randomly selected from the remaining images per person. [sent-350, score-0.231]

75 CCA does not work very well since it assumes the feature transforms to be uni-modal while the three datasets are much more complicated. [sent-353, score-0.223]

76 For our metho−d,y we also compare with the case when discriminative metric learning described in Section 4. [sent-357, score-0.202]

77 Compare rank-n identification rates (%) with other published single-shot results on VIPeR. [sent-369, score-0.212]

78 the gating network softly splits the feature space and output is the weighted sum of all experts. [sent-371, score-0.43]

79 Figure 7 shows an exemplar pair for each local expert, according to the largest responses of the gating network. [sent-373, score-0.289]

80 These pairs show different transforms caused by poses, lightings and backgrounds. [sent-374, score-0.304]

81 On CAVIAR, each person has M = 3 gallery images following the protocol in [2]. [sent-381, score-0.339]

82 On CUHK02 P1, each person has M = 2 gallery images. [sent-382, score-0.339]

83 Its success is also due to the fact that at the training stage it does not try to reduce cross-view transforms for every pair of images, which is difficult, but instead uses a smoothed max function to select the best matches from multi-shots for learning the feature transforms. [sent-385, score-0.39]

84 More general camera settings Our method can be easily extended to more general settings when camera views in test are not the same as those in training. [sent-389, score-0.576]

85 But when learning the discriminative metric in 4Notice that PS [2] and SDALF [5] are the only published results on CAVIAR. [sent-390, score-0.273]

86 But they both rely on features specially design for person identification according to prior knowledge but without any learning methods. [sent-391, score-0.298]

87 Ours NoM denotes our method but without discriminative metric learning in Section 4. [sent-398, score-0.202]

88 Compare rank-n identification rates (%) with other published multi-shot results (M=3) on CAVIAR. [sent-407, score-0.212]

89 5, we have to assume each view in the training set could be a query view or a gallery view. [sent-413, score-0.422]

90 If the training set contain multiple view pairs, we simply put their training samples together. [sent-414, score-0.263]

91 To make results stable, we randomly select a gallery set of 100 persons for 100 times. [sent-417, score-0.349]

92 Our method is still effective, because it has the ability to find the best crossview transforms from a complicated training set with combined view pairs. [sent-419, score-0.377]

93 Table 5 reports the rank-1 rates when P4 is in test and the training set has different combination of view pairs. [sent-420, score-0.187]

94 In CUHK02, the cross-view transforms in P3 have larger difference than those in P4. [sent-421, score-0.184]

95 When P3 is added to the training set, the performance of other learning methods (LDM, LMNN and ITML) drops significantly, because it makes the feature transforms in the training set more complicate to learn and there is a larger mismatch between the training set and camera views in test. [sent-422, score-0.874]

96 Conclusions We propose locally aligned feature transforms for matching pedestrians across camera views with complex crossview variations. [sent-438, score-0.841]

97 Images to be matched are softly assigned to different local experts according to the similarity of crossview transforms, then they are projected to a common feature space and matched with a locally learned discriminative metric. [sent-439, score-0.693]

98 It outperforms the state-of-the-art under the setting when two fixed camera views are given. [sent-440, score-0.403]

99 Experiments on a small camera network with five pairs of camera views show its good potential of being generalized to generic camera settings. [sent-441, score-0.857]

100 In the future, we will further explore its general- ization capability by creating a much larger camera network with more diversified cross-view variations. [sent-442, score-0.242]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('viper', 0.257), ('wk', 0.243), ('gating', 0.236), ('vk', 0.234), ('views', 0.23), ('cca', 0.193), ('caviar', 0.185), ('transforms', 0.184), ('persons', 0.178), ('camera', 0.173), ('gallery', 0.171), ('person', 0.168), ('experts', 0.166), ('vky', 0.157), ('wkx', 0.157), ('dld', 0.131), ('eq', 0.128), ('mk', 0.122), ('metric', 0.118), ('ldm', 0.104), ('methodstop', 0.104), ('mlmnn', 0.104), ('softly', 0.086), ('identification', 0.083), ('matched', 0.081), ('lightings', 0.081), ('published', 0.071), ('network', 0.069), ('samples', 0.067), ('training', 0.067), ('cuhk', 0.065), ('expert', 0.065), ('wkt', 0.064), ('crossview', 0.064), ('xepx', 0.064), ('spaces', 0.063), ('view', 0.062), ('query', 0.06), ('rates', 0.058), ('locally', 0.058), ('itml', 0.058), ('identity', 0.056), ('kj', 0.054), ('kt', 0.053), ('pair', 0.053), ('exepxp', 0.052), ('ktktx', 0.052), ('ktykti', 0.052), ('kvyik', 0.052), ('kyx', 0.052), ('pkt', 0.052), ('psosdu', 0.052), ('wkwkt', 0.052), ('wkxki', 0.052), ('xpkt', 0.052), ('xvi', 0.052), ('aligned', 0.05), ('yi', 0.049), ('kk', 0.049), ('projected', 0.047), ('learning', 0.047), ('lished', 0.046), ('yei', 0.046), ('yyjj', 0.046), ('pedestrians', 0.043), ('logdet', 0.043), ('prosser', 0.043), ('reidentification', 0.043), ('zheng', 0.042), ('pedestrian', 0.041), ('canonical', 0.041), ('localized', 0.041), ('sy', 0.04), ('metrics', 0.04), ('feature', 0.039), ('bazzani', 0.039), ('pairs', 0.039), ('profiles', 0.037), ('hong', 0.037), ('configurations', 0.037), ('discriminative', 0.037), ('kong', 0.036), ('projecting', 0.036), ('zhan', 0.036), ('frome', 0.036), ('match', 0.035), ('zij', 0.035), ('learned', 0.034), ('variations', 0.034), ('lmnn', 0.034), ('public', 0.032), ('jointly', 0.032), ('photometric', 0.032), ('protocols', 0.032), ('descent', 0.031), ('bse', 0.031), ('weinberger', 0.03), ('partitions', 0.03), ('poses', 0.03), ('sx', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 271 cvpr-2013-Locally Aligned Feature Transforms across Views

Author: Wei Li, Xiaogang Wang

Abstract: In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsityinducing norm and information theoretical regularization. . cuhk . edu .hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. Extensive experiments are conducted on public datasets and our own dataset. Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach.

2 0.19551922 270 cvpr-2013-Local Fisher Discriminant Analysis for Pedestrian Re-identification

Author: Sateesh Pedagadi, James Orwell, Sergio Velastin, Boghos Boghossian

Abstract: Metric learning methods, , forperson re-identification, estimate a scaling for distances in a vector space that is optimized for picking out observations of the same individual. This paper presents a novel approach to the pedestrian re-identification problem that uses metric learning to improve the state-of-the-art performance on standard public datasets. Very high dimensional features are extracted from the source color image. A first processing stage performs unsupervised PCA dimensionality reduction, constrained to maintain the redundancy in color-space representation. A second stage further reduces the dimensionality, using a Local Fisher Discriminant Analysis defined by a training set. A regularization step is introduced to avoid singular matrices during this stage. The experiments conducted on three publicly available datasets confirm that the proposed method outperforms the state-of-the-art performance, including all other known metric learning methods. Furthermore, the method is an effective way to process observations comprising multiple shots, and is non-iterative: the computation times are relatively modest. Finally, a novel statistic is derived to characterize the Match Characteris- tic: the normalized entropy reduction can be used to define the ’Proportion of Uncertainty Removed’ (PUR). This measure is invariant to test set size and provides an intuitive indication of performance.

3 0.18323776 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification

Author: Rui Zhao, Wanli Ouyang, Xiaogang Wang

Abstract: Human eyes can recognize person identities based on some small salient regions. However, such valuable salient information is often hidden when computing similarities of images with existing approaches. Moreover, many existing approaches learn discriminative features and handle drastic viewpoint change in a supervised way and require labeling new training data for a different pair of camera views. In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning. Distinctive features are extracted without requiring identity labels in the training procedure. First, we apply adjacency constrained patch matching to build dense correspondence between image pairs, which shows effectiveness in handling misalignment caused by large viewpoint and pose variations. Second, we learn human salience in an unsupervised manner. To improve the performance of person re-identification, human salience is incorporated in patch matching to find reliable and discriminative matched patches. The effectiveness of our approach is validated on the widely used VIPeR dataset and ETHZ dataset.

4 0.10135079 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild

Author: Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen

Abstract: In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by sum-pooling the nonnegative sparse codes of position-free patches sampled within the block (resp. volume). Whitened Principal Component Analysis (WPCA) is further utilized to reduce the feature dimension, which leads to our Spatial Face Region Descriptor (SFRD) (resp. Spatial-Temporal Face Region Descriptor, STFRD) for images (resp. videos). Moreover, we develop a new distance method for face verification metric learning called Pairwise-constrained Multiple Metric Learning (PMML) to effectively integrate the face region descriptors of all blocks (resp. volumes) from an image (resp. a video). Our work achieves the state- of-the-art performances on two real-world datasets LFW and YouTube Faces (YTF) according to the restricted protocol.

5 0.097349085 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification

Author: Zhen Li, Shiyu Chang, Feng Liang, Thomas S. Huang, Liangliang Cao, John R. Smith

Abstract: This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.

6 0.087119147 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

7 0.085207306 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

8 0.084309332 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

9 0.081221744 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

10 0.081071891 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

11 0.080160625 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera

12 0.079514518 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

13 0.07895539 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path

14 0.076687925 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

15 0.076632053 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

16 0.075998381 440 cvpr-2013-Tracking People and Their Objects

17 0.075088978 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval

18 0.073930494 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

19 0.073356733 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

20 0.071828708 143 cvpr-2013-Efficient Large-Scale Structured Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.187), (1, -0.002), (2, -0.045), (3, 0.002), (4, 0.038), (5, 0.005), (6, -0.02), (7, -0.056), (8, 0.049), (9, -0.044), (10, -0.022), (11, -0.004), (12, 0.063), (13, -0.052), (14, -0.036), (15, -0.03), (16, -0.038), (17, 0.026), (18, -0.003), (19, -0.057), (20, 0.013), (21, 0.022), (22, -0.116), (23, 0.037), (24, -0.025), (25, -0.059), (26, -0.037), (27, 0.083), (28, -0.019), (29, -0.029), (30, -0.029), (31, -0.021), (32, 0.015), (33, -0.01), (34, 0.041), (35, 0.01), (36, -0.002), (37, 0.036), (38, -0.065), (39, -0.105), (40, 0.036), (41, 0.025), (42, 0.116), (43, 0.033), (44, 0.029), (45, -0.076), (46, -0.01), (47, -0.07), (48, 0.009), (49, 0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91716206 271 cvpr-2013-Locally Aligned Feature Transforms across Views

Author: Wei Li, Xiaogang Wang

Abstract: In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsityinducing norm and information theoretical regularization. . cuhk . edu .hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. Extensive experiments are conducted on public datasets and our own dataset. Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach.

2 0.80433595 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification

Author: Rui Zhao, Wanli Ouyang, Xiaogang Wang

Abstract: Human eyes can recognize person identities based on some small salient regions. However, such valuable salient information is often hidden when computing similarities of images with existing approaches. Moreover, many existing approaches learn discriminative features and handle drastic viewpoint change in a supervised way and require labeling new training data for a different pair of camera views. In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning. Distinctive features are extracted without requiring identity labels in the training procedure. First, we apply adjacency constrained patch matching to build dense correspondence between image pairs, which shows effectiveness in handling misalignment caused by large viewpoint and pose variations. Second, we learn human salience in an unsupervised manner. To improve the performance of person re-identification, human salience is incorporated in patch matching to find reliable and discriminative matched patches. The effectiveness of our approach is validated on the widely used VIPeR dataset and ETHZ dataset.

3 0.76561338 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification

Author: Zhen Li, Shiyu Chang, Feng Liang, Thomas S. Huang, Liangliang Cao, John R. Smith

Abstract: This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.

4 0.76538044 270 cvpr-2013-Local Fisher Discriminant Analysis for Pedestrian Re-identification

Author: Sateesh Pedagadi, James Orwell, Sergio Velastin, Boghos Boghossian

Abstract: Metric learning methods, , forperson re-identification, estimate a scaling for distances in a vector space that is optimized for picking out observations of the same individual. This paper presents a novel approach to the pedestrian re-identification problem that uses metric learning to improve the state-of-the-art performance on standard public datasets. Very high dimensional features are extracted from the source color image. A first processing stage performs unsupervised PCA dimensionality reduction, constrained to maintain the redundancy in color-space representation. A second stage further reduces the dimensionality, using a Local Fisher Discriminant Analysis defined by a training set. A regularization step is introduced to avoid singular matrices during this stage. The experiments conducted on three publicly available datasets confirm that the proposed method outperforms the state-of-the-art performance, including all other known metric learning methods. Furthermore, the method is an effective way to process observations comprising multiple shots, and is non-iterative: the computation times are relatively modest. Finally, a novel statistic is derived to characterize the Match Characteris- tic: the normalized entropy reduction can be used to define the ’Proportion of Uncertainty Removed’ (PUR). This measure is invariant to test set size and provides an intuitive indication of performance.

5 0.64850831 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery

Author: Rikke Gade, Anders Jørgensen, Thomas B. Moeslund

Abstract: This paper presents a robust occupancy analysis system for thermal imaging. Reliable detection of people is very hard in crowded scenes, due to occlusions and segmentation problems. We therefore propose a framework that optimises the occupancy analysis over long periods by including information on the transition in occupancy, whenpeople enter or leave the monitored area. In stable periods, with no activity close to the borders, people are detected and counted which contributes to a weighted histogram. When activity close to the border is detected, local tracking is applied in order to identify a crossing. After a full sequence, the number of people during all periods are estimated using a probabilistic graph search optimisation. The system is tested on a total of 51,000 frames, captured in sports arenas. The mean error for a 30-minute period containing 3-13 people is 4.44 %, which is a half of the error percentage optained by detection only, and better than the results of comparable work. The framework is also tested on a public available dataset from an outdoor scene, which proves the generality of the method.

6 0.60022032 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

7 0.58625895 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People

8 0.58335698 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition

9 0.56134802 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

10 0.55659616 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models

11 0.55356795 266 cvpr-2013-Learning without Human Scores for Blind Image Quality Assessment

12 0.553101 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval

13 0.54913878 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

14 0.5442363 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images

15 0.53767759 391 cvpr-2013-Sensing and Recognizing Surface Textures Using a GelSight Sensor

16 0.52869207 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection

17 0.52013814 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

18 0.51065367 34 cvpr-2013-Adaptive Active Learning for Image Classification

19 0.50544828 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

20 0.4878386 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.074), (16, 0.41), (26, 0.046), (33, 0.216), (67, 0.075), (69, 0.044), (87, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92027974 410 cvpr-2013-Specular Reflection Separation Using Dark Channel Prior

Author: Hyeongwoo Kim, Hailin Jin, Sunil Hadap, Inso Kweon

Abstract: We present a novel method to separate specular reflection from a single image. Separating an image into diffuse and specular components is an ill-posed problem due to lack of observations. Existing methods rely on a specularfree image to detect and estimate specularity, which however may confuse diffuse pixels with the same hue but a different saturation value as specular pixels. Our method is based on a novel observation that for most natural images the dark channel can provide an approximate specular-free image. We also propose a maximum a posteriori formulation which robustly recovers the specular reflection and chromaticity despite of the hue-saturation ambiguity. We demonstrate the effectiveness of the proposed algorithm on real and synthetic examples. Experimental results show that our method significantly outperforms the state-of-theart methods in separating specular reflection.

2 0.8192398 118 cvpr-2013-Detecting Pulse from Head Motions in Video

Author: Guha Balakrishnan, Fredo Durand, John Guttag

Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.

3 0.78598142 27 cvpr-2013-A Theory of Refractive Photo-Light-Path Triangulation

Author: Visesh Chari, Peter Sturm

Abstract: 3D reconstruction of transparent refractive objects like a plastic bottle is challenging: they lack appearance related visual cues and merely reflect and refract light from the surrounding environment. Amongst several approaches to reconstruct such objects, the seminal work of Light-Path triangulation [17] is highly popular because of its general applicability and analysis of minimal scenarios. A lightpath is defined as the piece-wise linear path taken by a ray of light as it passes from source, through the object and into the camera. Transparent refractive objects not only affect the geometric configuration of light-paths but also their radiometric properties. In this paper, we describe a method that combines both geometric and radiometric information to do reconstruction. We show two major consequences of the addition of radiometric cues to the light-path setup. Firstly, we extend the case of scenarios in which reconstruction is plausible while reducing the minimal re- quirements for a unique reconstruction. This happens as a consequence of the fact that radiometric cues add an additional known variable to the already existing system of equations. Secondly, we present a simple algorithm for reconstruction, owing to the nature of the radiometric cue. We present several synthetic experiments to validate our theories, and show high quality reconstructions in challenging scenarios.

4 0.76979464 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking

Author: Ahmed T. Kamal, Jay A. Farrell, Amit K. Roy-Chowdhury

Abstract: Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computer vision. Multitarget tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Since most cameras are directional sensors, it is often the case that neighboring sensors may not be sensing the same target. Such sensors that do not have information about a target are termed as “naive ” with respect to that target. In this paper, we propose consensus-based distributed multi-target tracking algorithms in a camera network that are designed to address this issue of naivety. The estimation errors in tracking and data association, as well as the effect of naivety, are jointly addressed leading to the development of an informationweighted consensus algorithm, which we term as the Multitarget Information Consensus (MTIC) algorithm. The incorporation of the probabilistic data association mecha- nism makes the MTIC algorithm very robust to false measurements/clutter. Experimental analysis is provided to support the theoretical results.

5 0.75933886 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li

Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).

same-paper 6 0.75657368 271 cvpr-2013-Locally Aligned Feature Transforms across Views

7 0.74909461 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition

8 0.71488637 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition

9 0.69587386 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation

10 0.64076155 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation

11 0.63108164 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction

12 0.62998736 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms

13 0.62527275 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances

14 0.62328064 130 cvpr-2013-Discriminative Color Descriptors

15 0.61465049 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D

16 0.61252129 269 cvpr-2013-Light Field Distortion Feature for Transparent Object Recognition

17 0.61226892 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs

18 0.60654938 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition

19 0.60592532 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching

20 0.6016193 54 cvpr-2013-BRDF Slices: Accurate Adaptive Anisotropic Appearance Acquisition