cvpr cvpr2013 cvpr2013-146 knowledge-graph by maker-knowledge-mining

146 cvpr-2013-Enriching Texture Analysis with Semantic Data


Source: pdf

Author: Tim Matthews, Mark S. Nixon, Mahesan Niranjan

Abstract: We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. To this end, low-level attributes are selected and used to define a semantic space for texture. 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. Textures with strong presence ofattributes connoting randomness and complexity are shown to be poorly modelled by existing descriptors. In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. [sent-4, score-0.773]

2 To this end, low-level attributes are selected and used to define a semantic space for texture. [sent-5, score-0.497]

3 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. [sent-6, score-0.781]

4 Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. [sent-7, score-0.907]

5 In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. [sent-9, score-0.534]

6 Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks. [sent-10, score-0.492]

7 Introduction Visual texture is an important cue in numerous processes of human cognition. [sent-12, score-0.441]

8 Although computational texture analysis has achieved fine results over recent decades, there still remains a disparity between the visual and semantic spaces of texture the so-called semantic gap. [sent-16, score-1.482]

9 Computational approaches usually operate on the basis of a priori notions of texture not necessarily tied to human experience. [sent-17, score-0.49]

10 This means they are often unsuitable for applications requiring closer or more intuitive human interaction, such as content-based image retrieval, texture synthesis and description, or zero-shot learning, where a classification system is taught new categories without having to observe them. [sent-18, score-0.481]

11 Our work seeks to bridge this semantic gap for texture, and acts to unify separate research efforts into structuring the semantic [1] and visual [6, 8, 23] texture spaces, and into robustly identifying correspondences between semantic and visual data [21]. [sent-19, score-1.566]

12 Separate semantic modelling has been shown to improve retrieval of natural scenes [29] and gait signatures [26], and indoor-outdoor classification of photographs [27]. [sent-20, score-0.452]

13 In this paper we outline a semantic modelling of texture, allowing it to be described, synthesised, – and retrieved using fine-grained high-level semantic constructs rather than solely using low-level visual properties. [sent-21, score-0.811]

14 Humans are capable of analysing texture in a way resistant to noise, and invariant to illumination, rotation, and scale [10, 30]. [sent-23, score-0.387]

15 We sidestep this thorny issue by adopting a subjective definition of texture embedded in human experience. [sent-26, score-0.441]

16 Because our task involves tying some visual texture space to a semantic space borne from human interpretation of that visual space, it is fitting to adopt a definition of texture derived from human perception. [sent-27, score-1.347]

17 In this sense texture is anything describable by constructions from our semantic space and emerges as a natural consequence of our eventual definition of that space. [sent-28, score-0.704]

18 An assessment of how well a selection of visual texture descriptors are oabf lheo two capture lthecisti osenm ofan vtiiscu daal tteax. [sent-32, score-0.538]

19 uWree show how textures may be ranked according to both their semantic labels and their visual features in Section 3, and then describe the experiment used to compare these rankings in Section 4. [sent-33, score-0.659]

20 A demonstration of the benefits of explicit semantic modelling tfroart itoenxtu orfe t hreetri beevnael,f given einx pSleiccittio sen m5. [sent-34, score-0.386]

21 Semantic space of texture We choose to construct our semantic space using attributes, low-level visual qualities often adjectives shared across objects. [sent-37, score-0.907]

22 We see texture as being ill-suited to strict categorisation: key properties in which texture has been stated to vary include its coarseness, linearity, and regularity [15, 28], all of which may be expected to vary continuously. [sent-45, score-0.774]

23 Texture is particularly suitable for description with attributes as they may be readily sourced from the rich lexicon that has evolved in order to describe it. [sent-46, score-0.406]

24 Numerous elegant insights into the nature of the Englishlanguage texture lexicon were made by Bhushan et al. [sent-47, score-0.537]

25 [1], who asked subjects to cluster 98 texture adjectives according to similarity, without access to visual data. [sent-48, score-0.563]

26 html e1 Cluster interpretationSample words Table 1: Interpretations of the eleven texture word clusters identified in [1]. [sent-54, score-0.515]

27 we are able to create a new attribute lexicon of manageable size which adequately covers the semantic space of texture. [sent-64, score-0.682]

28 We use the 319 texture classes included in test suite Outex TC 0 0 0 16 captured with three different illuminants (hori z on, inca, t l 4), and at four different ro8 tation angles (0◦, 30◦, 60◦, 90◦), giving twelve samples per texture class, and a total of 3,828 samples all together. [sent-70, score-0.842]

29 1 1 12 2 24 4 479 7 Figure 1: Representative attributes from each of the clusters in Table 1, with approximate locations (normalised between 0 and 1) across the three texture dimensions identified in [1]. [sent-71, score-0.567]

30 Colour is taken to be a separate visual cue, and so all texture samples are converted to grayscale before experimentation. [sent-72, score-0.495]

31 We therefore require some labelling mechanism allowing the Outex textures to be placed along a continuum according to how strongly they evince each attribute. [sent-77, score-0.318]

32 This can be done by having subjects directly rate the perceived strength of attributes within each texture along a bounded rating scale [23]. [sent-78, score-0.821]

33 However, this is unintuitive when the assumption of an underlying bounded continuum is inappropriate: it is not obvious, for example, what form a maximally marbled texture would take. [sent-79, score-0.544]

34 This technique has previously been used specifically for texture by Tamura et al. [sent-82, score-0.387]

35 The subject is prompted to select the texture that exhibits a greater Figure 2: Comparison graph for an attribute, where a directed edge represents a dominance relation, and a double-directed edge represents a similarity relation. [sent-85, score-0.592]

36 Subjects are also given the option of stating that the attribute is completely absent from both textures, so as to avoid confusion when textures are shown for which a particular attribute is perceived to not apply. [sent-88, score-0.77]

37 Representing each attribute’s comparisons in terms of a directed graph (where vertices are textures and edges are comparisons see Figure 2), textures are selected by randomly choosing texture pairs with no path between them within that attribute’s comparison graph. [sent-91, score-0.995]

38 each texture class, owing to the natural human visual robustness to illumination and rotation when describing surface texture. [sent-103, score-0.63]

39 Because of this assumption only textures with rotation of 0◦ and illumination of hori z on are displayed to users. [sent-104, score-0.325]

40 Ranking textures In order to bridge the semantic gap we require a way of measuring the level of expression of each attribute within a texture, a quality we will refer to as an attribute’s strength. [sent-114, score-0.815]

41 Ranking from visual features When a new texture is provided we may wish to determine the strength of an attribute within it based only on its visual features. [sent-120, score-0.804]

42 To do this we derive a ranking function capable of mapping a visual descriptor to real value measures of attribute strength. [sent-121, score-0.607]

43 Using w to represent the coefficients of a linear ranking function for some attribute, and xi to represent the location in feature space of the ith texture in the dataset, the perceived strength of the attribute within that texture can be given as w · xi . [sent-122, score-1.272]

44 q Aivta tlhenist point, we viel u wsthreante x xth ies aatntr nibu ×te ns chosen in the previous section by displaying in Figure 3 the highest-rating texture in r for each attribute when C = 1. [sent-137, score-0.602]

45 In the next section we show how the rankings produced using these two different methods can be compared in order to ascertain which low-level features best reflect the high- level semantic attributes at our disposal. [sent-138, score-0.562]

46 Semantic correspondence of visual descriptors In this section we appraise each of a number of existing texture descriptors in terms of how well they reflect the structure of the semantic comparison graph for each of the eleven attributes. [sent-140, score-1.112]

47 These results allow us to identify regions within the semantic space of texture which are poorly modelled by current techniques, as well as the visual features which correspond best with human perception and which will provide the basis for our semantically-enriched descriptors. [sent-141, score-0.879]

48 Visual descriptors The five different texture descriptors to undergo assessment are: • • Co-occurrence matrices [7] are calculated for points sCitou-aotcecdu along eth me perimeters aorfe c ciracllceusl aotef dr a fdioir 1 p, o2in, t4s, 8, and 16 pixels. [sent-144, score-0.592]

49 1 1 12 2 25 4 4 9 1 9 (a) Blemished(b) Bumpy(c) Lined(d) Marbled(e) Random(f) Repetitive (g) Speckled(h) Spiral ed(i) Webbed(j) Woven(k) Wrinkled Figure 3: Illustrative Outex textures for each of the eleven attributes. [sent-152, score-0.331]

50 The texture shown is that with the highest value in the ratings calculated directly from the comparison graph for each attribute using Equation 2. [sent-153, score-0.692]

51 Methodology The semantic correspondence of each of the visual descriptors described above is evaluated using a 4-fold crossvalidation procedure. [sent-156, score-0.52]

52 For each attribute the optimal ranking function w is learned from the training images using the Ranking SVM formulation shown in Equation 1. [sent-157, score-0.356]

53 A per-attribute ranking of all 3,828 textures is then derived from w. [sent-162, score-0.344]

54 The misclassification rate is calculated over all dominance comparisons involving at least one of the textures in the hold-out set by simply comparing the rankings of the respective textures. [sent-163, score-0.65]

55 We also measure the correspondence between each learned ranking and the ‘ideal’ ranking inferred directly from the semantic comparison graph with the procedure in Section 3. [sent-164, score-0.651]

56 Analysis Misclassification rates and Spearman’s rank correlation coefficients for each combination of descriptor and attribute are shown in Tables 2 and 3. [sent-169, score-0.466]

57 It is apparent that the uniform binary patterns descriptor is the most suitable of those tested for capturing the structure of the semantic comparison graph. [sent-170, score-0.535]

58 In particular, it performs well for those attributes relating to disordered placement of small-scale primitives – blemi shed, bumpy, and speckled as well as for another attribute associated with disorder, marbled. [sent-171, score-0.663]

59 The uniform binary pattern descriptor is calculated as a histogram of local in– AttributeCoMGabLiuSGFUBP Table 2: Misclassification rates for each combination of descriptor and attribute. [sent-172, score-0.443]

60 This suggests that the curved lines of the structures perceived by subjects as being spi ral led are of similar scale to the large shifts in pixel uniformity occurring between 8 and 16 pixels from each reference pixel. [sent-183, score-0.367]

61 A similar effect is observed for l ined and woven, but due to these having associations of strong global texture orientation, and the co-occurrence matrix being based on spatially-localised patterns, they did not result in similarly low misclassification rates scores as for spiral led. [sent-184, score-0.685]

62 The Liu descriptor – comprising frequency measures based on the Fourier transform performs well for the two attributes involving regular placement of linear texture primitives: l ined and woven. [sent-185, score-0.855]

63 The Liu descriptor is also amongst the best performers for the polar notions of random and repet it ive: here, the inertia and energy of the first quadrant is again decisive. [sent-187, score-0.44]

64 Low inertia and high energy indicates random texture while the opposite aligns more closely with repet it ive texture. [sent-188, score-0.631]

65 Overall, the results indicate that there is considerable opportunity for improvement in the identification of visual features corresponding closely to human perception, especially for attributes exhibiting aspects of complexity or disorder such as spi ral led, webbed, and wrinkled. [sent-190, score-0.512]

66 These deficiencies are hardly unexpected, and tally with our knowledge of the workings of visual texture descriptors, but it is notable too that even correspondence with strongly regular attributes such as l ined and repet it ive is only average. [sent-191, score-0.989]

67 Even despite the lack of correspondence between these human and machine interpretations of texture, semantic data may still be used to improve performance in tasks involving texture analysis. [sent-192, score-0.81]

68 In the next section we demonstrate that semantic texture description results in considerable performance gains over a purely visual approach. [sent-193, score-0.854]

69 Retrieval In this section we demonstrate the practical benefit of semantic data in a retrieval experiment. [sent-195, score-0.383]

70 Methodology Each of the 3,828 samples in the dataset is used in turn as a query texture against the remaining 3,827 textures in the target set, of which only 11 are relevant to the query (each texture class has 12 samples due to variation in rotation and illumination). [sent-198, score-1.297]

71 All textures in the target set are then sorted by the Euclidean distance of their descriptors from the query texture’s descriptor yielding a ranking r where ri = 1if the member of the target set at rank iis relevant to the query, and 0 otherwise. [sent-199, score-0.764]

72 Next, for each descriptor eleven ranking functions are learned, one for each attribute. [sent-202, score-0.446]

73 These eleven ranking functions are then used to create a new eleven-dimensional semantic descriptor for each texture sample. [sent-205, score-1.15]

74 Lastly, we create a concatenated descriptor from all five visual descriptors which is in turn allows another semantic descriptor to be learned from the most discriminative features across all descriptors. [sent-206, score-0.866]

75 Again, the distances between the target set samples and the query sample are calculated for these concatenated and semantic descriptors, and a ranking derived. [sent-207, score-0.668]

76 , rn) for each query image we are able to calculate precision and recall measures, where precision is the proportion of the retrieved samples that are relevant, and recall is the proportion of the relevant samples that are retrieved: precision(n) =? [sent-211, score-0.544]

77 265% % Table 4: Mean average precision (MAP) and equal error rates (EER) for each descriptor across all 3,828 texture queries. [sent-233, score-0.653]

78 Bold- face denotes the highest scorer of each visual and semantic descriptor pair. [sent-234, score-0.568]

79 Analysis Precision-recall curves for both the visual and semantic version of each descriptor are shown in Figure 4. [sent-237, score-0.568]

80 In all but one of the curves that for uniform binary patterns it is immediately evident that the semantic descriptor gives higher retrieval performance than for the corresponding low-level visual descriptor. [sent-239, score-0.675]

81 This benefit is especially pronounced for higher rates of recall, where the semantic descriptor often retrieves relevant textures with a considerably higher rate of precision. [sent-240, score-0.837]

82 This initial impression from inspecting the curves is reinforced upon viewing the summary values in Table 4: the semantic descriptor achieves higher MAP and EER scores in all cases but one. [sent-242, score-0.494]

83 However, although the semantic form of the concatenated descriptor is the best overall descriptor in terms of EER, the visual form of the UBP – – descriptor is the best in terms of MAP. [sent-243, score-0.966]

84 The inferior MAP of the semantic UBP descriptor against its visual counterpart is possibly due to the relatively compact 11-dimensional semantic representation failing to capture as much variability as the 30-dimensional UBP descriptor. [sent-244, score-0.885]

85 Further experimentation is required to investigate the advantage a more expressive semantic space (in the form of more attributes) has for precision and recall. [sent-245, score-0.368]

86 Again, however, the semantic descriptor improves upon the visual one for higher recall rates. [sent-246, score-0.628]

87 This effect is more obvious in the ROC curve for this descriptor, reflected by the fact the EER for the semantic UBP descriptor is lower. [sent-247, score-0.494]

88 Discussion An explicit semantic modelling step provides numerous benefits when describing textures. [sent-251, score-0.424]

89 As well as allowing for more user-friendly interaction due to the bridging of the semantic gap, we demonstrated an improvement in retrieval rate for all but one of the descriptors tested. [sent-252, score-0.513]

90 Furthermore, the use of attributes introduces a natural efficiency and robustness in the design of feature vectors, owing to the evolution of human language and the invariant qualities of human visual perception. [sent-253, score-0.448]

91 The introduction of the dataset enables new semantic performance metrics to be used when assessing texture descriptors. [sent-254, score-0.704]

92 It is important that the deficiencies encountered in our appraisal of visual descriptors are addressed so as to properly bridge the semantic gap for texture and to pave the 1 1 12 2 25 5 524 2 way for closer correspondence to human perception and expectations in user-centred visual applications. [sent-255, score-1.207]

93 In future work we aim to build on the work of [1] whose methodology was only performed on the limited Brodatz dataset and without modern facilities such as crowdsourcing so that a more principled and refined texture lexicon is available to vision researchers. [sent-256, score-0.588]

94 We also aim to further explore the visual space of texture and to describe novel texture features that align particularly closely with human perception. [sent-257, score-0.902]

95 The texture lexicon: Understanding the categorization ofvisual texture terms and their relationship to texture images. [sent-265, score-1.161]

96 Internal representation of visual texture as the basis for the judgment of similarity. [sent-317, score-0.461]

97 Outex - new framework for empirical evaluation of texture analysis algorithms. [sent-396, score-0.387]

98 Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. [sent-405, score-0.428]

99 Towards a texture naming system: Identifying relevant dimensions of texture. [sent-436, score-0.436]

100 The use of semantic human description as a soft biometric. [sent-465, score-0.411]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('texture', 0.387), ('semantic', 0.317), ('attribute', 0.215), ('textures', 0.203), ('attributes', 0.18), ('descriptor', 0.177), ('lexicon', 0.15), ('ranking', 0.141), ('ubp', 0.138), ('eer', 0.136), ('misclassification', 0.128), ('eleven', 0.128), ('outex', 0.123), ('repet', 0.111), ('perceived', 0.088), ('qualities', 0.086), ('bhushan', 0.083), ('bumpy', 0.083), ('marbled', 0.083), ('speckled', 0.083), ('webbed', 0.083), ('comparisons', 0.082), ('query', 0.081), ('descriptors', 0.077), ('continuum', 0.074), ('woven', 0.074), ('visual', 0.074), ('ined', 0.071), ('ive', 0.069), ('modelling', 0.069), ('dominance', 0.068), ('retrieval', 0.066), ('rankings', 0.065), ('spi', 0.064), ('inertia', 0.064), ('spiral', 0.061), ('recall', 0.06), ('ij', 0.059), ('subjects', 0.059), ('ral', 0.059), ('attributecomgabliusgfubp', 0.055), ('blemi', 0.055), ('nixon', 0.055), ('strength', 0.054), ('human', 0.054), ('rate', 0.053), ('correspondence', 0.052), ('biometrics', 0.052), ('methodology', 0.051), ('precision', 0.051), ('calculated', 0.051), ('subject', 0.05), ('notions', 0.049), ('disordered', 0.049), ('prompted', 0.049), ('stating', 0.049), ('tamura', 0.049), ('uniformity', 0.049), ('relevant', 0.049), ('led', 0.048), ('perception', 0.047), ('hori', 0.045), ('deficiencies', 0.045), ('disorder', 0.045), ('sgf', 0.045), ('proportion', 0.045), ('concatenated', 0.044), ('adjectives', 0.043), ('boldface', 0.043), ('shed', 0.043), ('spearman', 0.043), ('rotation', 0.041), ('patterns', 0.041), ('nea', 0.041), ('textural', 0.041), ('bridge', 0.041), ('primitives', 0.041), ('labelling', 0.041), ('intuitive', 0.04), ('description', 0.04), ('placement', 0.04), ('farhadi', 0.04), ('quadrant', 0.039), ('ratings', 0.039), ('inspection', 0.039), ('gabor', 0.039), ('gap', 0.039), ('describing', 0.038), ('rates', 0.038), ('directed', 0.038), ('rank', 0.036), ('illumination', 0.036), ('considerable', 0.036), ('pages', 0.036), ('pietikainen', 0.036), ('evolved', 0.036), ('tim', 0.035), ('strongest', 0.035), ('retrieved', 0.034), ('samples', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

Author: Tim Matthews, Mark S. Nixon, Mahesan Niranjan

Abstract: We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. To this end, low-level attributes are selected and used to define a semantic space for texture. 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. Textures with strong presence ofattributes connoting randomness and complexity are shown to be poorly modelled by existing descriptors. In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks.

2 0.25749663 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.20715035 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

Author: Gautam Singh, Jana Kosecka

Abstract: This paper presents a nonparametric approach to semantic parsing using small patches and simple gradient, color and location features. We learn the relevance of individual feature channels at test time using a locally adaptive distance metric. To further improve the accuracy of the nonparametric approach, we examine the importance of the retrieval set used to compute the nearest neighbours using a novel semantic descriptor to retrieve better candidates. The approach is validated by experiments on several datasets used for semantic parsing demonstrating the superiority of the method compared to the state of art approaches.

4 0.20466484 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

Author: Heesoo Myeong, Kyoung Mu Lee

Abstract: We propose a novel nonparametric approach for semantic segmentation using high-order semantic relations. Conventional context models mainly focus on learning pairwise relationships between objects. Pairwise relations, however, are not enough to represent high-level contextual knowledge within images. In this paper, we propose semantic relation transfer, a method to transfer high-order semantic relations of objects from annotated images to unlabeled images analogous to label transfer techniques where label information are transferred. Wefirst define semantic tensors representing high-order relations of objects. Semantic relation transfer problem is then formulated as semi-supervised learning using a quadratic objective function of the semantic tensors. By exploiting low-rank property of the semantic tensors and employing Kronecker sum similarity, an efficient approximation algorithm is developed. Based on the predicted high-order semantic relations, we reason semantic segmentation and evaluate the performance on several challenging datasets.

5 0.18625304 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

6 0.1824313 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

7 0.17497611 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

8 0.16145794 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

9 0.16099717 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

10 0.15950923 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

11 0.15090367 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction

12 0.14656073 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

13 0.14281407 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

14 0.14241494 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

15 0.13778019 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

16 0.13687472 391 cvpr-2013-Sensing and Recognizing Surface Textures Using a GelSight Sensor

17 0.1363913 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

18 0.13168873 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images

19 0.12523259 251 cvpr-2013-Learning Discriminative Illumination and Filters for Raw Material Classification with Optimal Projections of Bidirectional Texture Functions

20 0.12216728 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.237), (1, -0.098), (2, -0.006), (3, 0.007), (4, 0.12), (5, 0.071), (6, -0.267), (7, 0.048), (8, 0.009), (9, 0.137), (10, -0.013), (11, 0.067), (12, -0.029), (13, 0.039), (14, 0.084), (15, -0.042), (16, 0.022), (17, -0.022), (18, 0.035), (19, 0.043), (20, 0.108), (21, 0.011), (22, 0.004), (23, 0.067), (24, -0.087), (25, -0.064), (26, 0.028), (27, 0.023), (28, -0.035), (29, -0.027), (30, 0.017), (31, -0.054), (32, -0.058), (33, -0.061), (34, -0.067), (35, -0.011), (36, -0.048), (37, 0.127), (38, -0.091), (39, 0.056), (40, -0.019), (41, 0.032), (42, 0.017), (43, 0.042), (44, -0.009), (45, -0.059), (46, -0.004), (47, 0.071), (48, 0.009), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96987784 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

Author: Tim Matthews, Mark S. Nixon, Mahesan Niranjan

Abstract: We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. To this end, low-level attributes are selected and used to define a semantic space for texture. 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. Textures with strong presence ofattributes connoting randomness and complexity are shown to be poorly modelled by existing descriptors. In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks.

2 0.78231853 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi

Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.

3 0.76385373 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

4 0.7480064 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

Author: Catherine Wah, Serge Belongie

Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.

5 0.74687499 99 cvpr-2013-Cross-View Image Geolocalization

Author: Tsung-Yi Lin, Serge Belongie, James Hays

Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.

6 0.74104208 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

7 0.73904669 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

8 0.73336238 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

9 0.72799814 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

10 0.69242573 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction

11 0.69155145 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

12 0.6405893 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

13 0.63904971 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

14 0.61305076 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling

15 0.60776085 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

16 0.60340637 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

17 0.59863526 391 cvpr-2013-Sensing and Recognizing Surface Textures Using a GelSight Sensor

18 0.57825613 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

19 0.53826994 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

20 0.53649926 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.119), (16, 0.012), (26, 0.051), (28, 0.011), (33, 0.273), (67, 0.07), (69, 0.032), (77, 0.283), (87, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89476573 18 cvpr-2013-A Max-Margin Riffled Independence Model for Image Tag Ranking

Author: Tian Lan, Greg Mori

Abstract: We propose Max-Margin Riffled Independence Model (MMRIM), a new method for image tag ranking modeling the structured preferences among tags. The goal is to predict a ranked tag list for a given image, where tags are ordered by their importance or relevance to the image content. Our model integrates the max-margin formalism with riffled independence factorizations proposed in [10], which naturally allows for structured learning and efficient ranking. Experimental results on the SUN Attribute and LabelMe datasets demonstrate the superior performance of the proposed model compared with baseline tag ranking methods. We also apply the predicted rank list of tags to several higher-level computer vision applications in image understanding and retrieval, and demonstrate that MMRIM significantly improves the accuracy of these applications.

2 0.88903141 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

Author: Yannis Panagakis, Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic

Abstract: Temporal alignment of human behaviour from visual data is a very challenging problem due to a numerous reasons, including possible large temporal scale differences, inter/intra subject variability and, more importantly, due to the presence of gross errors and outliers. Gross errors are often in abundance due to incorrect localization and tracking, presence of partial occlusion etc. Furthermore, such errors rarely follow a Gaussian distribution, which is the de-facto assumption in machine learning methods. In this paper, building on recent advances on rank minimization and compressive sensing, a novel, robust to gross errors temporal alignment method is proposed. While previous approaches combine the dynamic time warping (DTW) with low-dimensional projections that maximally correlate two sequences, we aim to learn two underlyingprojection matrices (one for each sequence), which not only maximally correlate the sequences but, at the same time, efficiently remove the possible corruptions in any datum in the sequences. The projections are obtained by minimizing the weighted sum of nuclear and ?1 norms, by solving a sequence of convex optimization problems, while the temporal alignment is found by applying the DTW in an alternating fashion. The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.

3 0.88088107 402 cvpr-2013-Social Role Discovery in Human Events

Author: Vignesh Ramanathan, Bangpeng Yao, Li Fei-Fei

Abstract: We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.

same-paper 4 0.84219408 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

Author: Tim Matthews, Mark S. Nixon, Mahesan Niranjan

Abstract: We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. To this end, low-level attributes are selected and used to define a semantic space for texture. 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. Textures with strong presence ofattributes connoting randomness and complexity are shown to be poorly modelled by existing descriptors. In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks.

5 0.83702552 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding

Author: Jérôme Revaud, Matthijs Douze, Cordelia Schmid, Hervé Jégou

Abstract: This paper presents an approach for large-scale event retrieval. Given a video clip of a specific event, e.g., the wedding of Prince William and Kate Middleton, the goal is to retrieve other videos representing the same event from a dataset of over 100k videos. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain. Our method outperforms the state of the art both in search quality and query time on two large-scale video benchmarks for copy detection, TRECVID and CCWEB. Finally, we introduce a challenging dataset for event retrieval, EVVE, and report the performance on this dataset.

6 0.80162388 364 cvpr-2013-Robust Object Co-detection

7 0.78776586 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

8 0.76294756 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

9 0.75542581 213 cvpr-2013-Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions

10 0.75414705 412 cvpr-2013-Stochastic Deconvolution

11 0.75122434 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging

12 0.74848932 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources

13 0.74768615 432 cvpr-2013-Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain

14 0.74670309 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles

15 0.74619091 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

16 0.74579763 28 cvpr-2013-A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

17 0.74469471 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition

18 0.74333096 414 cvpr-2013-Structure Preserving Object Tracking

19 0.74324149 164 cvpr-2013-Fast Convolutional Sparse Coding

20 0.74316669 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation