cvpr cvpr2013 cvpr2013-101 knowledge-graph by maker-knowledge-mining

101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation


Source: pdf

Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy

Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. [sent-5, score-0.543]

2 Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. [sent-6, score-1.646]

3 More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e. [sent-7, score-1.324]

4 Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling. [sent-10, score-2.381]

5 Examples of such problems include age estimation from facial images [10, 12, 15, 16, 33, 35], crowd counting [4, 5, 8, 25], and human body/face pose (view angle) estimation [14, 27, 34]. [sent-13, score-1.129]

6 human age and people count), and its estimation can be obtained by solving a multi-class classification problem [13, 21]. [sent-16, score-0.571]

7 Age estimation and crowd counting both suffer from sparse and imbalanced training data distribution. [sent-139, score-0.855]

8 To exploit this observation, most existing approaches to the problem consider a regression solution in which a mapping function is learned explicitly between high dimensional feature input vectors and scalar output values [4, 5, 8, 10, 12, 15, 16, 33, 35]. [sent-146, score-0.541]

9 However, there are two major challenges for learning a good regression function for solving such a problem: (1) inconsistent and incomplete features, (2) sparse and imbalanced training data. [sent-147, score-0.591]

10 Accurately labelled facial images for human age estimation and public space video data for crowd counting are generally sparse and imbalanced due to inherent ambiguities in annotation and a lack of sufficient samples for covering the data distribution. [sent-171, score-1.393]

11 As a result, benchmarking datasets such as FGNET [7, 13, 15, 35] and MORPH [7, 13] contain very limited samples of each age group and consist of faces of true ages rather than annotated age. [sent-175, score-0.577]

12 Figure 1 shows that in the FG-NET dataset, at most 46 images are available for each age group and the distribution is highly imbalanced across the age groups. [sent-176, score-1.079]

13 Even though annotating crowd images can be made more reliable, annotating people count exhaustively for all possible values is laborious and often practically infeasible, e. [sent-178, score-0.527]

14 Moreover, the sparseness in training data also implies that there are often gaps in training samples where no imagery sample is available for mapping onto certain output values causing difficulties in learning the regression mapping function. [sent-182, score-0.547]

15 To that end, we propose a novel cumulative attribute based representation for learning a regression model. [sent-184, score-1.096]

16 Existing attribute learning methods cannot be directly applied to our regression problem because: (1) Attributes need be discriminative to be useful. [sent-188, score-0.702]

17 However, for learning a regression model it is much less clear what is discriminative and more importantly what can be shared across different scalar output values when those values change continuously. [sent-191, score-0.536]

18 (2) Existing attribute definitions do not reflect nor exploit the unique characteristic of neighbouring scalar output values sharing more similarities than those further apart. [sent-192, score-0.632]

19 Our notion of cumulative attributes aims to explore the spirit of the conventional discriminative attribute for addressing sparse training data, whilst is specifically designed for addressing the regression problem. [sent-193, score-1.539]

20 More specifically, each attribute is not only discriminative but also cumulative in constraining all other attribute values depending on its relative positioning in value: each attribute separates all training images into two groups (binary) by a label (e. [sent-194, score-1.627]

21 For instance, for learning a regression model for age estimation, if there are 70 age groups, there will be 69 binary attributes, each separating facial images above certain age from all those below. [sent-197, score-1.666]

22 By cumulative attributes, we consider each attribute cumulatively conditioning all other attributes. [sent-198, score-0.777]

23 This is designed specifically to capture the unique correlation of data samples so that those with neighbouring scalar output values share more than those further away in our cumulative attribute space. [sent-200, score-1.07]

24 Critically, this cumulative nature is also able to cope with sparse and imbalanced data distribution more effectively. [sent-201, score-0.675]

25 In particular, by utilising all data samples for discriminating each attribute regardless the availability of labelled data for that attribute (value) alone, sparsity problem is mitigated. [sent-202, score-0.825]

26 The cumulative nature of the attribute also greatly reduce the ill-effect of imbalanced data, e. [sent-203, score-0.992]

27 even if there was no sample for a certain age value (attribute), that attribute is positively assigned by any samples of lower age than the considered value, thus can be learned indirectly using plenty of neighbouring samples. [sent-205, score-1.388]

28 222444666866 Once cumulative attributes are constructed from the scalar values of training samples, a two-layers regression framework is employed. [sent-207, score-1.072]

29 Firstly, given any low-level feature presentation of the image, we learn a multi-output regression model to map the feature inputs to an intermediate attribute space. [sent-208, score-0.689]

30 Secondly, another regression model is learned to estimate the scalar output using the attribute representation as input. [sent-210, score-0.895]

31 Related Work Age estimation Most existing techniques for age estimation from facial images fall into three categories: multiclass classification [13], regression [16], and hybrid [15] of the two, with regression models being the most widely used. [sent-213, score-1.141]

32 [35] proposed a multi-task wrapped Gaussian Process Regression for personalized age estimation thatjointly learns personalized characteristics and common changes shared between people. [sent-218, score-0.59]

33 Our approach is designed to utilise any low-level features and regression models, with the key difference being that the input to the regression model is represented by cumulative attributes instead of the low-level features directly. [sent-219, score-1.234]

34 More recently, a ranking based age estimation method is proposed [7]. [sent-220, score-0.517]

35 For each age group, a ranker (a binary classifier) is learned to separate people into two groups, older or younger than the said age group. [sent-221, score-0.983]

36 Crowd counting Similar to age estimation, crowd counting can be solved by either classification and regression with most recent work adopting the regression approach. [sent-227, score-1.618]

37 Despite the low-level features being very different, the same regression models such as support vector machine regression and Gaussian Processes have been employed for both problems [4, 5, 8, 25]. [sent-228, score-0.589]

38 On the other hand, manually defined attributes may not be computable consistently nor discriminative sufficiently despite additional human annotation, from which data driven attributes do not suffer. [sent-234, score-0.528]

39 Our cumulative attributes are unique such that each attribute has clear semantic meaning and by defini– – tion being discriminative, yet no additional annotation is required. [sent-235, score-1.064]

40 They are specifically designed for learning a regression model whilst none of the existing attribute representations is suitable. [sent-236, score-0.747]

41 Note that recently proposed notion of relative attribute [19, 29] defines attribute as the real-valued strength of the presence of visual properties. [sent-240, score-0.764]

42 However, relative attributes are learned as a ranking problem rather than a regression problem because only pairwise-comparison data are available [19, 29]. [sent-241, score-0.55]

43 Contributions Our contributes are three-fold: (1) For the first time, an attribute representation is constructed for learning a regression model. [sent-242, score-0.701]

44 (2) A novel concept of cumulative attributes is proposed with both clear semantic meaning and also discriminative, with added advantages of efficiently computable and requiring no additional annotation. [sent-243, score-0.686]

45 (3) Extensive experiments on both age estimation and crowd counting benchmark datasets demonstrate the superiority of our method over the state-of-the-arts, especially when the data is sparse and imbalanced. [sent-244, score-1.06]

46 Methodology As shown in Figure 2, our cumulative attributes can be considered as an intermediate-level semantic representation that bridges the gap between any low-level features and a regression model given sparse annotation. [sent-246, score-1.007]

47 During training 222444666977 our cumulative attribute based regression framework consists of the following steps: 1. [sent-247, score-1.077]

48 age or people count) is converted into a binary cumulative attribute vector (Section 3. [sent-250, score-1.265]

49 A cumulative attribute representation is computed so that given an image, its cumulative attributes can be assigned and used as an intermediate representation of the image. [sent-253, score-1.446]

50 Specifically, a single multi-output regression model is learned to evaluate and assign all attributes simultaneously (Section 3. [sent-254, score-0.524]

51 A second layer single output regression model is learned to map the attribute representation to the scalar output value (Section 3. [sent-257, score-0.967]

52 During testing, given an unseen image, the cumulative attribute vector is first computed using the multi-output regression model with the low-level imagery features as input. [sent-259, score-1.105]

53 The cumulative attribute vector is then fed into the single output regression model to estimate the scalar output value. [sent-260, score-1.293]

54 This can be Active Appearance Model features [9] for age estimation and foreground & edges & GLCM features [4, 8] for crowd counting. [sent-267, score-0.897]

55 Secondly, normalization on the feature data including scale normalization and extra perspective normalization [4] for crowd counting are carried out. [sent-269, score-0.521]

56 age and people count) is converted into a cumulative attribute vector ai. [sent-272, score-1.265]

57 Typically, for age or crowd count, there is an upper limit, e. [sent-274, score-0.788]

58 70 for a certain age dataset and 100 for a certain crowd scene. [sent-276, score-0.828]

59 For example, a face of age 40 and another face of age 41 represented using a 69D CA vector will have only one element that is different, whilst the number of different attribute elements increases to 30 for a face of age 10. [sent-291, score-1.807]

60 Our cumulative attributes thus capture a better representation of a continuously changing value for object appearance, corresponding directly to a scalar output value change continuously for learning a regression function. [sent-293, score-1.266]

61 3 show the distinct advantages of using CA over NCA for both age estimation and crowd counting. [sent-295, score-0.847]

62 In our work, we estimate the mappings of all m attributes simultaneously by learning a multi-output regression function, in particular, a multivariate ridge regression function [1, 17]. [sent-306, score-0.882]

63 Given xi and aij being low-level features of the ith image and the jth element of its corresponding attribute vector, the objective function for the jth attribute is written as: min 12? [sent-310, score-0.964]

64 m I,n w Equation (t1e) ,o tuhre m jth eclo tloum jonin otlfy m waetriigxh W ea cihs employed to weigh the imagery feature vector xi for the jth binary attribute in corresponding attribute learning, i. [sent-332, score-0.943]

65 Since the residual error of all attribute learning tasks are penalized jointly by the Frobenius-norm, this multi-output model can capture the correlation between different attributes explicitly. [sent-335, score-0.663]

66 Mapping Attributes to Scalar Output To estimate the mapping between a and y, first the lowlevel feature x is mapped onto our cumulative attribute space using the learned multi-output regression model above. [sent-338, score-1.121]

67 Note, this regression model has a single scalar output and any existing regression models used in the literature for either age estimation or crowd counting can be readily ap- plied. [sent-343, score-1.728]

68 Both datasets are designed primarily for learning person-independent age estimator and contain people of different ethnical origins. [sent-348, score-0.588]

69 3), Support Vector Regression (SVR) with RBF kernel and Ridge Regression (RR) were employed for age estimation and crowd counting respectively, owing to their strong performance reported in the literature for age [15, 16] and crowd [8] respectively. [sent-365, score-1.825]

70 Evaluation Metrics For age estimation, we employed two evaluation metrics, namely mean absolute error (mae) and cumulative score (cs), which was first defined in [13] and we set the same error level 5 as in [7]. [sent-367, score-0.871]

71 The only difference is in the input to the regression model: low level feature directly for SVR and our cumulative attributes for CA-SVR. [sent-376, score-0.931]

72 As the key difference between the FG-NET and MORPH dataset is data sparsity and the number of age groups without samples, it is evident from these results that the advantage of our cumulative attribute based regression model is more significant given sparse and imbalanced data. [sent-385, score-1.821]

73 2e581 Crowd counting Table 3 compares crowd estimation performances of six different methods, all based on regression, – 1The results of OHRank were based slightly lower than those reported in [7]. [sent-396, score-0.561]

74 The result shows that the cumulative attribute based model (CA-RR) performs the best for both datasets and using all three metrics. [sent-398, score-0.796]

75 The most direct effect of using our cumulative attribute representation can be seen by comparing RR [8] with CA-RR. [sent-399, score-0.799]

76 Since both have the same low level feature input and use the same single output regression model, the performance gain can only be explained by the superior representation by our cumulative attribute space. [sent-401, score-1.155]

77 A key novelty of our model is the cumulative attribute representation. [sent-423, score-0.777]

78 1, compared with the conventional non-cumulative (NCA) attributes, the unique characteristics of our cumulative attributes (CA) is that data points of neighbouring scalar value are designed to be close to each other in the attribute space. [sent-425, score-1.3]

79 It is evident from Tables 4 and 5 that constructing such cumulative attributes is a significant advantage for a regression model that performs age estimation and crowd counting. [sent-426, score-1.771]

80 Age estimation performance with sparse and imbalanced data measured using cumulative scores (the higher the better). [sent-430, score-0.717]

81 Data of certain age groups and certain crowd counts were removed 222444777200 (a) UCSD (b) Mall Figure 4. [sent-432, score-0.86]

82 For age estimation, since the two dataset have few missing age groups, we randomly selected a fixed number of age groups, each time to remove and then train the model. [sent-435, score-1.296]

83 For the crowd counting dataset, this way of removing data would be less effective because the mapping between the low level features and the scalar count numbers is more linear. [sent-436, score-0.786]

84 However, our model’s performance degraded more gracefully, resulting in the bigger performance gain over both the non-attribute based models (SVR and RR for age and crowd respectively) and non-cumulative attribute methods. [sent-440, score-1.17]

85 These results fur- ther validate our early observation that the construction of a cumulative attribute space is uniquely effective for coping with sparse and imbalanced training data, a common problem in learning regression functions. [sent-441, score-1.368]

86 m i1 nd03 e279pn- dently learning cumulative attributes (i-CA). [sent-452, score-0.653]

87 Instead of learning all attributes jointly using our multiout regression model, experiments were conducted to learn each attribute independently using a single out ridge regression model. [sent-453, score-1.264]

88 In particular, for more imbalanced data with the removal of 75% labels from the original training dataset, our joint learning model yields more significant advantage on both the FG-NET age dataset and the UCSD crowd dataset. [sent-455, score-1.062]

89 It is evident that the proposed cumulative attribute based model is extremely fast to learn owing to its closed form solution based on a multi-output regression model (see Section 3. [sent-465, score-1.094]

90 This is because after mapping the low level image features to the cumulative attribute space, dimensionality reduction is achieved as a by-product resulting faster single output regression model training. [sent-469, score-1.17]

91 This is because the cumulative attribute space has a similar dimension as the original low-level feature and CA has the additional step of estimating the attribute values. [sent-471, score-1.178]

92 For age estimation, the AAM features capture the shape and texture characteristics of a human face. [sent-477, score-0.523]

93 Figures 5(a) and (b) show that our learned cumulative attribute indeed capture this phenomenon rather well. [sent-482, score-0.802]

94 Conclusion We have introduced a novel cumulative attribute based framework for solving a number of computer vision problems invoking the need for regression estimation. [sent-492, score-1.046]

95 Noisy and sparse low level visual features are mapped onto a cumulative attribute space where each dimension is designed specifically to give a clear semantic meaning that captures how the scalar output (e. [sent-493, score-1.126]

96 It requires no additional human annotation to assign attributes and can be estimated efficiently and robustly given sparse and imbalanced training data. [sent-496, score-0.565]

97 Extensive experiments show the effectiveness and efficiency of the proposed model for both age estimation and crowd counting. [sent-497, score-0.847]

98 Privacy preserving crowd monitoring: counting people without people models or track- [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] ing. [sent-527, score-0.614]

99 Human age estimation with regression on discriminative aging manifold. [sent-589, score-0.836]

100 Image-based human age estimation by manifold learning and locally adjusted robust regression. [sent-614, score-0.543]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('age', 0.432), ('cumulative', 0.395), ('attribute', 0.382), ('crowd', 0.356), ('regression', 0.269), ('attributes', 0.23), ('imbalanced', 0.215), ('scalar', 0.147), ('counting', 0.146), ('morph', 0.1), ('ohrank', 0.097), ('nca', 0.086), ('ucsd', 0.069), ('ridge', 0.063), ('count', 0.063), ('estimation', 0.059), ('people', 0.056), ('facial', 0.053), ('neighbouring', 0.053), ('ages', 0.053), ('aging', 0.053), ('aij', 0.052), ('benchmarking', 0.051), ('output', 0.05), ('sparse', 0.048), ('mae', 0.048), ('whilst', 0.047), ('svr', 0.042), ('jth', 0.04), ('evgeniou', 0.04), ('fu', 0.039), ('ranker', 0.038), ('argyriou', 0.038), ('personalized', 0.038), ('guo', 0.037), ('mall', 0.036), ('imagery', 0.034), ('rr', 0.033), ('addressing', 0.033), ('ethnical', 0.032), ('rankers', 0.032), ('wwhhe', 0.032), ('groups', 0.032), ('mapping', 0.031), ('continuously', 0.031), ('training', 0.031), ('evident', 0.03), ('uncertain', 0.03), ('learning', 0.028), ('conventional', 0.027), ('mde', 0.027), ('ordinal', 0.027), ('ca', 0.026), ('annotating', 0.026), ('inconsistency', 0.026), ('employed', 0.026), ('ranking', 0.026), ('element', 0.025), ('loy', 0.025), ('learned', 0.025), ('features', 0.025), ('human', 0.024), ('compounded', 0.024), ('yi', 0.024), ('figures', 0.023), ('tpami', 0.023), ('jointly', 0.023), ('hospedales', 0.023), ('characteristics', 0.023), ('discriminative', 0.023), ('multivariate', 0.023), ('value', 0.022), ('aam', 0.022), ('representation', 0.022), ('samples', 0.022), ('meaning', 0.022), ('fj', 0.021), ('labelled', 0.021), ('designed', 0.021), ('computable', 0.021), ('weigh', 0.02), ('modelling', 0.02), ('certain', 0.02), ('metrics', 0.02), ('feature', 0.019), ('face', 0.019), ('texture', 0.019), ('datasets', 0.019), ('change', 0.019), ('cs', 0.019), ('discovered', 0.019), ('owing', 0.018), ('extrinsic', 0.018), ('ith', 0.018), ('sparsity', 0.018), ('level', 0.018), ('semantic', 0.018), ('cope', 0.017), ('yan', 0.017), ('annotation', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy

Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

2 0.31327307 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.27465966 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images

Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah

Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.

4 0.25912619 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features

Author: Zheng Ma, Antoni B. Chan

Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.

5 0.23611513 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

6 0.2316308 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

7 0.21590045 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

8 0.20315754 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

9 0.20110162 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

10 0.19521596 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

11 0.19034038 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

12 0.17615853 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

13 0.16720997 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

14 0.16649844 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

15 0.16145794 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

16 0.14786592 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

17 0.1344468 99 cvpr-2013-Cross-View Image Geolocalization

18 0.1273638 282 cvpr-2013-Measuring Crowd Collectiveness

19 0.11396223 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

20 0.10772273 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.167), (1, -0.123), (2, -0.057), (3, -0.04), (4, 0.101), (5, 0.121), (6, -0.29), (7, 0.043), (8, 0.163), (9, 0.18), (10, -0.045), (11, 0.121), (12, -0.033), (13, -0.012), (14, 0.11), (15, 0.071), (16, -0.071), (17, 0.013), (18, 0.028), (19, 0.05), (20, 0.033), (21, 0.148), (22, -0.123), (23, 0.001), (24, -0.059), (25, -0.065), (26, -0.15), (27, -0.12), (28, 0.043), (29, -0.104), (30, 0.003), (31, 0.009), (32, -0.031), (33, 0.117), (34, -0.019), (35, -0.072), (36, 0.078), (37, -0.066), (38, 0.118), (39, -0.038), (40, -0.075), (41, 0.115), (42, -0.131), (43, 0.052), (44, -0.076), (45, -0.075), (46, 0.103), (47, 0.07), (48, -0.003), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95672911 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy

Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

2 0.67251462 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

3 0.66109008 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images

Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah

Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.

4 0.6607976 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

5 0.64873594 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

Author: Catherine Wah, Serge Belongie

Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.

6 0.63824928 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features

7 0.63722682 282 cvpr-2013-Measuring Crowd Collectiveness

8 0.63603848 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

9 0.60564387 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

10 0.59437823 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

11 0.59427136 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

12 0.57632327 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

13 0.54986817 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

14 0.52135766 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

15 0.49939403 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

16 0.48576853 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

17 0.44916326 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

18 0.43179089 99 cvpr-2013-Cross-View Image Geolocalization

19 0.40521082 264 cvpr-2013-Learning to Detect Partially Overlapping Instances

20 0.39621237 78 cvpr-2013-Capturing Layers in Image Collections with Componential Models: From the Layered Epitome to the Componential Counting Grid


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.197), (10, 0.086), (16, 0.02), (19, 0.01), (26, 0.06), (33, 0.354), (67, 0.062), (69, 0.038), (87, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93036854 99 cvpr-2013-Cross-View Image Geolocalization

Author: Tsung-Yi Lin, Serge Belongie, James Hays

Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.

same-paper 2 0.91628152 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy

Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

3 0.9128027 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

Author: Ken Sakurada, Takayuki Okatani, Koichiro Deguchi

Abstract: This paper proposes a method for detecting temporal changes of the three-dimensional structure of an outdoor scene from its multi-view images captured at two separate times. For the images, we consider those captured by a camera mounted on a vehicle running in a city street. The method estimates scene structures probabilistically, not deterministically, and based on their estimates, it evaluates the probability of structural changes in the scene, where the inputs are the similarity of the local image patches among the multi-view images. The aim of the probabilistic treatment is to maximize the accuracy of change detection, behind which there is our conjecture that although it is difficult to estimate the scene structures deterministically, it should be easier to detect their changes. The proposed method is compared with the methods that use multi-view stereo (MVS) to reconstruct the scene structures of the two time points and then differentiate them to detect changes. The experimental results show that the proposed method outperforms such MVS-based methods.

4 0.91035163 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

Author: Bo Wang, Zhuowen Tu

Abstract: With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure. The real-world data however often come with noises and outliers; seldom, all the data live in a single linear subspace. Inspired by the recent advances in sparse subspace learning and diffusion-based approaches, we propose a new manifold denoising algorithm in which data neighborhoods are adaptively inferred via sparse subspace reconstruction; we then derive a new formulation to perform denoising to the original data. Experiments carried out on both toy and real applications demonstrate the effectiveness of our method; it is insensitive to parameter tuning and we show significant improvement over the competing algorithms.

5 0.88471758 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.

6 0.88415039 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

7 0.88409323 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

8 0.8838985 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

9 0.88334978 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

10 0.88300878 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

11 0.88291496 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

12 0.88285387 202 cvpr-2013-Hierarchical Saliency Detection

13 0.88263184 464 cvpr-2013-What Makes a Patch Distinct?

14 0.88260072 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation

15 0.88209045 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

16 0.88192636 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

17 0.88188666 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

18 0.88173181 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification

19 0.88169789 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images

20 0.88168055 415 cvpr-2013-Structured Face Hallucination