iccv iccv2013 iccv2013-178 knowledge-graph by maker-knowledge-mining

178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

Source: pdf

Author: Chen Change Loy, Shaogang Gong, Tao Xiang

Abstract: Regression-based techniques have shown promising results for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most informative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) Labelled data from other scenes are employed to further alleviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regression framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd patterns via manifold analysis. Extensive experiments validate the effectiveness of our approach.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Regression-based techniques have shown promising results for people counting in crowded scenes. [sent-8, score-0.364]

2 (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. [sent-11, score-0.615]

3 All three ideas are implemented in a unified active and semi-supervised regression framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd patterns via manifold analysis. [sent-13, score-1.031]

4 Introduction Video-imagery based crowd counting [21] is important for profiling the population movement over time across spaces for establishing global situational awareness. [sent-16, score-0.794]

5 In this study, we aim to learn a regression model for crowd counting by annotating only a handful of frames * Most of the work was done when the first author mantics Ltd, London, UK. [sent-21, score-1.024]

6 The underlying assumption is that if the selected samples are informative and representative, this should have a minimal effect on the learned regression model as compared to learn- ing from all exhaustively labelled frames. [sent-25, score-0.551]

7 (2) For videobased crowd counting, potentially unlimited amount of data can be readily collected. [sent-26, score-0.503]

8 Rather than learning from only labelled data, the abundant unlabelled data are to be exploited. [sent-27, score-0.615]

9 (3) Instead of learning a regression model from scratch in every new scene, the labelled data from other scenes should also be exploited to compensate for the lack of labelled data in the new scene. [sent-29, score-0.822]

10 Although different scenes can be visually very different, the crowd patterns share some common grounds (e. [sent-31, score-0.538]

11 larger crowd leads to large foreground areas) which correspond to transferrable knowledge. [sent-33, score-0.503]

12 In order to realise these three ideas for crowd counting with only a handful of labelled frames in one scene and generalising to other scenes, we develop a unified framework for active and semi-supervised learning of a regression model with transfer learning capability. [sent-34, score-1.567]

13 The framework is formulated based on exploiting the underlying manifold structure of unlabelled crowd data to facilitate counting when the labelled samples are sparse. [sent-35, score-1.54]

14 We observe that crowd pattern data often form a well structured manifold due to the inherent imaging process for generat22225566 Figure1. [sent-37, score-0.657]

15 toaglb feature vector of crowd pattern of a video frame. [sent-72, score-0.476]

16 Every point is encoded by colour so that points with higher crowd density are red and points with fewer people are blue. [sent-73, score-0.575]

17 ing crowd patterns from shared physical spaces subject to social behavioural constraints [15]. [sent-76, score-0.515]

18 Figure 1 shows different examples of manifold embedding of crowd patterns extracted from three different public scenes. [sent-77, score-0.669]

19 It is evident that typically the crowd density (e. [sent-78, score-0.525]

20 This formulation builds on the Laplacian regularised least squares concept [25], but is reformulated carefully to employ Hessian energy [18, 24] for manifold regularisation due to the latter’s superior extrapolation potential for semisupervised learning of a regression function. [sent-82, score-0.619]

21 Modelling the underlying crowd pattern structure also provides a solution to active regression learning. [sent-83, score-0.715]

22 In addition to exploiting intrinsic structures of unlabelled data collected from the same scene for active and semi-supervised regression modelling, we further develop a transfer learning capability to utilise available labelled data from other scenes. [sent-86, score-1.018]

23 In this study, we investigate in particular how manifold regularisation would help in learning a crowd counting model with labelled data collected from a different scene. [sent-89, score-1.522]

24 Related Work Crowd counting: Various approaches to crowd counting have been proposed [21], including counting-bydetection [20, 39, 12], counting-by-clustering [6, 29], and counting-by-regression [9, 10, 19, 7]. [sent-93, score-0.769]

25 The regression-based techniques are fundamentally supervised methods, which often assume the availability of large amount of labelled data for training. [sent-95, score-0.313]

26 [3 1] relax this assumption by presenting a semi-supervised learning framework, which utilises sequential information in the unlabelled frames to penalise sudden prediction change. [sent-97, score-0.313]

27 high enough video frame rate is required to capture the smoothness in crowd pattern change over time. [sent-100, score-0.525]

28 Our approach relaxes this assumption since our method explores smoothness in intrinsic crowd pattern distribution structure, not only in the video stream temporal space, leading to a more generic/scalable and robust approach to crowd counting estimation (see comparative experiments in Sec. [sent-102, score-1.347]

29 The intuition of incorporating manifold regularisation in semi-supervised learning has also been studied [1, 4, 38, 18], whilst manifold-based transfer learning has been proposed in [34] to transfer knowledge across domains via an aligned man- ifold. [sent-107, score-0.767]

30 However, no crowd counting studies have attempted manifold regularisation for achieving semi-supervised and transfer counting. [sent-108, score-1.299]

31 Although existing work on manifold learning are relevant for our problem, applying them directly for active and semi-supervised regression modelling of crowd count is non-trivial and has not been attempted before. [sent-109, score-0.953]

32 Our contributions are three-fold: (1) To eliminate exhaustive data labelling for learning a regression based crowd counting model, this is the first study to systematically develop a unified active and semi-supervised crowd counting regression model using only a handful of annotations. [sent-118, score-2.094]

33 (2) A concept of transfer counting with practical potential is proposed and a transfer learning model based on crowd data manifold regularisation is formulated to utilise labelled crowd data from other crowd scenes. [sent-119, score-2.766]

34 (3) Extensive comparative evaluations are conducted using two publicly available crowd datasets and a new dataset extracted from the i-LIDS dataset [16] to demonstrate the effectiveness of the proposed approach. [sent-120, score-0.476]

35 Semi-supervised Crowd Counting Counting by regression: Taking a regression approach to crowd counting, one typically extracts a set of perspective normalised low-level features x from each frame, e. [sent-124, score-0.716]

36 foreground segments or an edge map, and subsequently learns a × model to predict the crowd density given the low-level features. [sent-126, score-0.506]

37 Ridge Regression (RR) or its kernelised version, Kernel Ridge Regression (KRR) have shown promising performance for crowd counting regression [10]; it is thus chosen as the regression baseline model in our framework. [sent-127, score-1.049]

38 Formally, given a set of l labelled samples {(xi, of samples xi nfro am se Xt o ⊆f l R ladb wellitedh corresponding lab)e}ls yi in Y ⊆ R, KRfRr oemsti mXat ⊆es Rthe unknown regression function as yi)}il=1, f∗= afrg∈HmKin1l? [sent-128, score-0.555]

39 Semi-supervised regression: A semi-supervised regression method is specifically formulated here to produce accurate person counting given only sparse labelled data. [sent-137, score-0.719]

40 This is made possible by exploiting the underlying geometric structure of abundant unlabelled data and temporal continuity of crowd pattern. [sent-138, score-0.847]

41 A user shall only label a few data points and the }jj==ll++u1, yi)}il=1, rest of the unlabelled training data will be annotated automatically by inference using the model. [sent-140, score-0.34]

42 Our goal is to perform semi-supervised learning to assimilate the vast majority of unlabelled data points U by the lsaimbeil sa toef t thhee v samsta mll minority uLn. [sent-142, score-0.357]

43 aTbheisl eisd computed by a joint regularisation through learning Tthhies c isrow codm pattern yint ari jnoisinct distribution (geometric) structure (p(x)) and imposing temporal smoothness of activity patterns in the scene. [sent-143, score-0.417]

44 In other words, we would like to ensure that the solution is optimal with respect to three considerations: (1) regression in a reduced kernel space (RKHS), (2) the marginal distribution of unlabelled data points p(x), and (3) temporal continuity in the physical space. [sent-144, score-0.48]

45 I2 is a regularisation term to reflect the intrinsic wstrhuecrteu ? [sent-155, score-0.283]

46 Distribution structure regularisation: The underlying distribution structure (geometrical) of crowd patterns can be modelled using a crowd manifold. [sent-164, score-1.02]

47 Specifically, Hessian regularisation prefer functions that vary linearly with respect to the geodesics on the data manifold [18]. [sent-175, score-0.434]

48 × The total estimated Hessian energy is a sum over all (l + u) labelled and unlabelled points ? [sent-189, score-0.553]

49 (5) Temporal regularisation: The temporal constraint can be incorporated easily into our framework by assuming that if two observations xi and xj occur close in time, then the crowd density should not differ significantly. [sent-204, score-0.613]

50 By the representer theorem, given an unseen low-level feature vector x∗, the crowd density is estimated as αl+u]T f∗(x∗) = ? [sent-245, score-0.506]

51 Our intuition is that given a fixed number of labelling budget, the most representative frames (in the sense of covering different crowd densities/counts) are the most useful ones to label. [sent-250, score-0.581]

52 To solve this problem, we – propose to discover these representative points (“supporting points”) through clustering in the crowd marginal distribution structure (manifold). [sent-252, score-0.499]

53 Specifically, given a crowd manifold learned from a set of unlabelled data, we perform spectral clustering [26] on the data projected onto the manifold. [sent-253, score-0.901]

54 Each node in the graph corresponds to frame-level global crowd patterns, connected by edges whose weights are defined by the affinity between the patterns. [sent-255, score-0.476]

55 Transfer Counting For transfer learning in general, one considers a given sparse set of labelled target training instances Ltarget = s{p(axrtsaerg este,t ty otfar lgaetb)e}ll. [sent-281, score-0.5]

56 (a)-(b) Performing feature mapping using the corresponding points to align the feature range of ucsd and hallway datasets. [sent-284, score-0.454]

57 (c) The embedding of the cross-domain manifold using the source data ucsd (red dots) and target data hallway (blue dots). [sent-285, score-0.711]

58 In the context of transfer crowd counting, we consider that the most straightforward approach to transferring labelled data from one scene to another is featurerepresentation transfer [28]. [sent-289, score-1.096]

59 Therefore, the step for learning a shared manifold is rather critical in that it allows one to constrain the smoothness of our solution with respect to the intrinsic geometry of the cross-domain data space. [sent-306, score-0.27]

60 Experiments Datasets: Apart from the established UCSD pedestrian dataset (ucsd) [7] and a more recent shopping mall dataset (mall) [10, 9, 21], we introduce a new dataset in this study for comparative evaluation, referred to as the i-LIDs hallway dataset (hallway). [sent-311, score-0.416]

61 The hallway dataset is composed of 2200 frames extracted at 3 frames per second (fps) from the sequence ABTEN201c ofthe i-LIDS dataset [16]. [sent-313, score-0.293]

62 In particular, the perspective distortion, especially in the hallway dataset, is heavier than that in the ucsd dataset, resulting in more severe inter-object occlusion, and larger change in object size and appearance at different depths of the scene. [sent-318, score-0.479]

63 In addition, the mall dataset is challenging in that it covers crowd densities from sparse to crowded, as well as diverse activity patterns (static and moving crowds) under large range of illumination conditions at different time of the day. [sent-319, score-0.701]

64 For both the ucsd and the hallway datasets, scene lighting is stable so we employ a static background subtraction method to extract the foreground segments. [sent-323, score-0.45]

65 Performance comparison between the KRR baseline regression and the proposed semi-supervised regression (SSR) method: with manifold regularisation, temporal regularisation, a combination of two, and finally the automatic labelled data selection. [sent-352, score-0.793]

66 All the above free parameters for each method were optimally estimated by cross validation on the labelled samples. [sent-357, score-0.286]

67 Semi-Supervised Crowd Counting Semi-supervised learning: The goal of this experiment is to evaluate the effectiveness of exploiting unlabelled data distribution structure and temporal regularisations in the semi-supervised regression (SSR) learning framework. [sent-360, score-0.49]

68 Note that we follow [7] and [10] in partitioning the ucsd and mall datasets. [sent-362, score-0.376]

69 A total of 50 samples in the training partition are randomly selected as labelled samples, while the rest of the samples in the training partition (750 in both ucsd and mall, and 450 in hallway) remain unlabelled. [sent-363, score-0.632]

70 We evaluate the transductive learning (tested with unlabelled data in the training partition) and inductive inference (tested with unlabelled data in the test partition) performances of the proposed SSR method with different regularisation terms. [sent-364, score-0.867]

71 (ESM)egarvA150 #1La0bel20d4 80 40 #20U1n0la5b0el25d (a) ucsd (b) mall (c) hallway Figure 3. [sent-368, score-0.597]

72 It is evident from Table 2 that semi-supervised learning improves remarkably the crowd counting performance with the help of unlabelled data, i. [sent-370, score-1.065]

73 an average of 18% reduction in MSE over KRR when we apply labelled data selection. [sent-372, score-0.332]

74 the ucsd with 10 fps and the hallway with 3fps, slightly better results were obtained using the temporal smoothness constraint in comparison to manifold regularisation. [sent-375, score-0.678]

75 We further examine the effect of labelled and unlabelled data, by measuring the MSE performances on labelled set {5, 10, 20, 40, 80} given unlabelled set {o2n5, l 50, 100, 200, {450,01}0. [sent-378, score-1.06]

76 Figure 38 0sh}ow gsiv clearly athbeatl adding more 0u,nl1a0b0e,l2le0d0 ,d4a0ta0 improved 3th seh counting performance. [sent-379, score-0.293]

77 For instance, given 80 labelled data, the MSE in the ucsd, mall, and hallway datasets were reduced by nearly 7%, 22%, and 19% respectively, when we increased the unlabelled data size from 25 to 400. [sent-380, score-0.778]

78 Active learning for labelled points selection: In this experiment we compare our manifold-based “supporting point” selection method (m-landmark) (see Sec. [sent-381, score-0.371]

79 For instance, our method constantly outperforms RAND by around 7%-9% reduction in MSE on the ucsd and hallway datasets. [sent-385, score-0.45]

80 The result also shows that compared to [3 1], our method gains better performance on the ucsd and mall datasets, and more stable performance overall (see the standard deviation plots in Fig. [sent-387, score-0.376]

81 Active semi-supervised learning: Figure 5 shows a comparison of the actual counting performance between KRR (without semi-supervised learning) and our full active semisupervised regression method. [sent-389, score-0.523]

82 Count estimation performance using three different labelled data selection methods. [sent-394, score-0.342]

83 Comparison of counting performance between the KRR and our semi-supervised method SSR on the hallway dataset. [sent-410, score-0.514]

84 our method achieved 20% reduction in mean squared error with just 10% of labelled samples as compared to the KRR. [sent-411, score-0.358]

85 The proposed SSR approach not only consistently outperforms existing methods given sparse labelled samples (50 samples), but also performs comparatively to GPR and CA-RR that learn from full training set. [sent-414, score-0.336]

86 Transfer Crowd Counting In this experiment we evaluate the proposed transfer counting method (Sec. [sent-423, score-0.416]

87 We randomly selected 100 random labelled samples from the source data to be transferred for target model learning. [sent-426, score-0.416]

88 In addition, a total of 50 random labelled data in the target scene are chosen for bootstrapping, 25 of which have corresponding labels with the source labelled set. [sent-427, score-0.69]

89 ples are employed to learn a mapping function for aligning the source labelled set. [sent-438, score-0.339]

90 The ucsd and hallway datasets are selected in this experiment. [sent-439, score-0.431]

91 Table 4 summarises the transfer counting results averaged over 10 trials. [sent-440, score-0.416]

92 using the 50 labelled data in the target scene for model learning. [sent-443, score-0.371]

93 In the bottom half of the table, we show the transfer learning results on both models, of which training are conducted using the target scene data as well as 100 labelled data from the source domain. [sent-444, score-0.606]

94 It is evident that transferring the data without learning a cross-domain manifold (i. [sent-445, score-0.275]

95 However, when those source data are embedded in a shared cross-domain manifold together with the target data, they can effectively help in filling the ‘gap’ not captured in the target labelled data, leading to a more accurate estimation. [sent-450, score-0.578]

96 We demonstrated that the lack of labelled data in a new scene can be helped by knowledge transferred from other scenes in minimising the effort required for bootstrapping crowd counting at the new scene. [sent-454, score-1.124]

97 In the current transfer counting method, we imposed an assumption that the source and target data sharing a similar manifold representation. [sent-456, score-0.669]

98 Cumulative attribute space for age and crowd density estimation. [sent-510, score-0.506]

99 A geometric framework for transfer learning using manifold alignment. [sent-628, score-0.31]

100 Visual knowledge transfer among multiple cameras for people counting with oc- [1 0]la37HKotec. [sent-645, score-0.439]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('crowd', 0.476), ('counting', 0.293), ('labelled', 0.286), ('krr', 0.271), ('regularisation', 0.253), ('unlabelled', 0.244), ('ssr', 0.24), ('hallway', 0.221), ('ucsd', 0.21), ('mall', 0.166), ('manifold', 0.154), ('regression', 0.14), ('transfer', 0.123), ('mse', 0.079), ('hessian', 0.077), ('xsource', 0.075), ('normalised', 0.072), ('active', 0.07), ('labelling', 0.069), ('xtarget', 0.06), ('handful', 0.058), ('afrg', 0.053), ('gt', 0.048), ('whilst', 0.048), ('crowded', 0.048), ('count', 0.047), ('laplacian', 0.046), ('temporal', 0.046), ('lsource', 0.045), ('methodtransductiveinductive', 0.045), ('pifo', 0.045), ('rlad', 0.045), ('transferring', 0.042), ('poral', 0.04), ('tem', 0.04), ('patterns', 0.039), ('target', 0.039), ('xi', 0.036), ('frames', 0.036), ('ka', 0.035), ('il', 0.035), ('supporting', 0.034), ('budget', 0.033), ('modelling', 0.033), ('source', 0.033), ('learning', 0.033), ('informative', 0.033), ('exhaustively', 0.032), ('aij', 0.032), ('ridge', 0.031), ('yi', 0.031), ('roi', 0.031), ('samples', 0.031), ('intrinsic', 0.03), ('density', 0.03), ('assimilate', 0.03), ('eam', 0.03), ('iftbf', 0.03), ('ysource', 0.03), ('underlying', 0.029), ('pedestrian', 0.029), ('selection', 0.029), ('perspective', 0.028), ('loy', 0.028), ('bf', 0.027), ('annotation', 0.027), ('data', 0.027), ('transferrable', 0.027), ('gpr', 0.027), ('smoothness', 0.026), ('xj', 0.025), ('abundant', 0.025), ('laborious', 0.025), ('profiling', 0.025), ('frame', 0.023), ('people', 0.023), ('scenes', 0.023), ('xiang', 0.023), ('points', 0.023), ('belkin', 0.022), ('squared', 0.022), ('actively', 0.021), ('annotating', 0.021), ('fps', 0.021), ('circumvent', 0.02), ('employed', 0.02), ('semisupervised', 0.02), ('activity', 0.02), ('inductive', 0.02), ('severe', 0.02), ('tm', 0.02), ('exhaustive', 0.019), ('scene', 0.019), ('training', 0.019), ('reduction', 0.019), ('utilise', 0.019), ('extrapolation', 0.019), ('alignment', 0.019), ('evident', 0.019), ('partition', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

Author: Chen Change Loy, Shaogang Gong, Tao Xiang

2 0.14037825 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

3 0.12679456 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang

Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.

4 0.12213362 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples

Author: Hongteng Xu, Hongyuan Zha

Abstract: Data sparsity has been a thorny issuefor manifold-based image synthesis, and in this paper we address this critical problem by leveraging ideas from transfer learning. Specifically, we propose methods based on generating auxiliary data in the form of synthetic samples using transformations of the original sparse samples. To incorporate the auxiliary data, we propose a weighted data synthesis method, which adaptively selects from the generated samples for inclusion during the manifold learning process via a weighted iterative algorithm. To demonstrate the feasibility of the proposed method, we apply it to the problem of face image synthesis from sparse samples. Compared with existing methods, the proposed method shows encouraging results with good performance improvements.

5 0.08336588 10 iccv-2013-A Framework for Shape Analysis via Hilbert Space Embedding

Author: Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi

Abstract: We propose a framework for 2D shape analysis using positive definite kernels defined on Kendall’s shape manifold. Different representations of 2D shapes are known to generate different nonlinear spaces. Due to the nonlinearity of these spaces, most existing shape classification algorithms resort to nearest neighbor methods and to learning distances on shape spaces. Here, we propose to map shapes on Kendall’s shape manifold to a high dimensional Hilbert space where Euclidean geometry applies. To this end, we introduce a kernel on this manifold that permits such a mapping, and prove its positive definiteness. This kernel lets us extend kernel-based algorithms developed for Euclidean spaces, such as SVM, MKL and kernel PCA, to the shape manifold. We demonstrate the benefits of our approach over the state-of-the-art methods on shape classification, clustering and retrieval.

6 0.078215614 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification

7 0.075410858 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos

8 0.074328333 305 iccv-2013-POP: Person Re-identification Post-rank Optimisation

9 0.069590859 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection

10 0.066252448 6 iccv-2013-A Convex Optimization Framework for Active Learning

11 0.063728712 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation

12 0.061549556 438 iccv-2013-Unsupervised Visual Domain Adaptation Using Subspace Alignment

13 0.059430167 100 iccv-2013-Curvature-Aware Regularization on Riemannian Submanifolds

14 0.058872227 421 iccv-2013-Total Variation Regularization for Functions with Values in a Manifold

15 0.055635553 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

16 0.053229116 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features

17 0.052843869 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading

18 0.050719868 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation

19 0.04945666 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

20 0.049141329 123 iccv-2013-Domain Adaptive Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, 0.019), (2, -0.026), (3, -0.015), (4, -0.013), (5, 0.015), (6, 0.01), (7, 0.03), (8, 0.029), (9, -0.011), (10, -0.021), (11, -0.05), (12, -0.036), (13, -0.03), (14, 0.055), (15, -0.028), (16, -0.056), (17, -0.023), (18, -0.033), (19, -0.023), (20, 0.019), (21, -0.031), (22, 0.036), (23, 0.062), (24, -0.0), (25, 0.06), (26, 0.021), (27, 0.039), (28, 0.044), (29, -0.061), (30, -0.024), (31, 0.064), (32, -0.018), (33, 0.037), (34, 0.043), (35, 0.063), (36, 0.028), (37, 0.036), (38, -0.017), (39, -0.019), (40, 0.03), (41, -0.005), (42, -0.059), (43, -0.12), (44, -0.031), (45, -0.01), (46, 0.053), (47, -0.0), (48, 0.016), (49, -0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91453201 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

Author: Chen Change Loy, Shaogang Gong, Tao Xiang

2 0.78038484 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples

Author: Hongteng Xu, Hongyuan Zha

3 0.61988294 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation

Author: Xiatian Zhu, Chen Change Loy, Shaogang Gong

Abstract: Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambiguity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and independent non-visual auxiliary information to better describe and summarise subtlephysical events in complex scenes. Specifically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our framework on two crowded public surveillance datasets.

4 0.6030947 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading

Author: Yuru Pei, Tae-Kyun Kim, Hongbin Zha

Abstract: Lipreading from visual channels remains a challenging topic considering the various speaking characteristics. In this paper, we address an efficient lipreading approach by investigating the unsupervised random forest manifold alignment (RFMA). The density random forest is employed to estimate affinity of patch trajectories in speaking facial videos. We propose novel criteria for node splitting to avoid the rank-deficiency in learning density forests. By virtue of the hierarchical structure of random forests, the trajectory affinities are measured efficiently, which are used to find embeddings of the speaking video clips by a graph-based algorithm. Lipreading is formulated as matching between manifolds of query and reference video clips. We employ the manifold alignment technique for matching, where the L∞norm-based manifold-to-manifold distance is proposed to find the matching pairs. We apply this random forest manifold alignment technique to various video data sets captured by consumer cameras. The experiments demonstrate that lipreading can be performed effectively, and outperform state-of-the-arts.

5 0.58218926 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

Author: Bo Wang, Zhuowen Tu, John K. Tsotsos

Abstract: In graph-based semi-supervised learning approaches, the classification rate is highly dependent on the size of the availabel labeled data, as well as the accuracy of the similarity measures. Here, we propose a semi-supervised multi-class/multi-label classification scheme, dynamic label propagation (DLP), which performs transductive learning through propagation in a dynamic process. Existing semi-supervised classification methods often have difficulty in dealing with multi-class/multi-label problems due to the lack in consideration of label correlation; our algorithm instead emphasizes dynamic metric fusion with label information. Significant improvement over the state-of-the-art methods is observed on benchmark datasets for both multiclass and multi-label tasks.

6 0.5724321 421 iccv-2013-Total Variation Regularization for Functions with Values in a Manifold

7 0.54347587 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection

8 0.51759416 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

9 0.51527035 305 iccv-2013-POP: Person Re-identification Post-rank Optimisation

10 0.50769311 47 iccv-2013-Alternating Regression Forests for Object Detection and Pose Estimation

11 0.49578285 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

12 0.49488509 10 iccv-2013-A Framework for Shape Analysis via Hilbert Space Embedding

13 0.49144289 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

14 0.48792577 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation

15 0.48536551 100 iccv-2013-Curvature-Aware Regularization on Riemannian Submanifolds

16 0.48501414 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

17 0.47961709 404 iccv-2013-Structured Forests for Fast Edge Detection

18 0.47901669 124 iccv-2013-Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information

19 0.47760642 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling

20 0.47699797 413 iccv-2013-Target-Driven Moire Pattern Synthesis by Phase Modulation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.051), (7, 0.372), (12, 0.033), (13, 0.013), (26, 0.065), (31, 0.034), (42, 0.071), (48, 0.016), (64, 0.052), (73, 0.02), (89, 0.138), (95, 0.012), (98, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78118831 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

Author: Chen Change Loy, Shaogang Gong, Tao Xiang

2 0.73594332 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines

Author: Yubin Kuang, Kalle Åström

Abstract: In this paper, we study the geometry problems of estimating camera pose with unknown focal length using combination of geometric primitives. We consider points, lines and also rich features such as quivers, i.e. points with one or more directions. We formulate the problems as polynomial systems where the constraints for different primitives are handled in a unified way. We develop efficient polynomial solvers for each of the derived cases with different combinations of primitives. The availability of these solvers enables robust pose estimation with unknown focal length for wider classes of features. Such rich features allow for fewer feature correspondences and generate larger inlier sets with higher probability. We demonstrate in synthetic experiments that our solvers are fast and numerically stable. For real images, we show that our solvers can be used in RANSAC loops to provide good initial solutions.

3 0.73578393 212 iccv-2013-Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning

Author: Jiwen Lu, Gang Wang, Pierre Moulin

Abstract: This paper presents a new approach for image set classification, where each training and testing example contains a set of image instances of an object captured from varying viewpoints or under varying illuminations. While a number of image set classification methods have been proposed in recent years, most of them model each image set as a single linear subspace or mixture of linear subspaces, which may lose some discriminative information for classification. To address this, we propose exploring multiple order statistics as features of image sets, and develop a localized multikernel metric learning (LMKML) algorithm to effectively combine different order statistics information for classification. Our method achieves the state-of-the-art performance on four widely used databases including the Honda/UCSD, CMU Mobo, and Youtube face datasets, and the ETH-80 object dataset.

4 0.73567641 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang

Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.

5 0.67754602 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

Author: Yan Yan, Elisa Ricci, Ramanathan Subramanian, Oswald Lanz, Nicu Sebe

Abstract: We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearancewise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the target’s position using a person tracker, the appropriate regionspecific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.

6 0.65452862 409 iccv-2013-Supervised Binary Hash Code Learning with Jensen Shannon Divergence

7 0.58418089 27 iccv-2013-A Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration

8 0.5779438 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

9 0.56081641 229 iccv-2013-Large-Scale Video Hashing via Structure Learning

10 0.55391395 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image

11 0.5480569 347 iccv-2013-Recursive Estimation of the Stein Center of SPD Matrices and Its Applications

12 0.54669631 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings

13 0.53926241 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

14 0.5391103 436 iccv-2013-Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach

15 0.53480238 353 iccv-2013-Revisiting the PnP Problem: A Fast, General and Optimal Solution

16 0.52876776 389 iccv-2013-Shortest Paths with Curvature and Torsion

17 0.52737552 296 iccv-2013-On the Mean Curvature Flow on Graphs with Applications in Image and Manifold Processing

18 0.5195502 342 iccv-2013-Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length

19 0.51817644 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization

20 0.5178054 25 iccv-2013-A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models