cvpr cvpr2013 cvpr2013-461 knowledge-graph by maker-knowledge-mining

461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes


Source: pdf

Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu

Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. [sent-9, score-1.241]

2 We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. [sent-12, score-0.395]

3 The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. [sent-14, score-0.717]

4 We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. [sent-15, score-0.676]

5 For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. [sent-17, score-0.48]

6 Then update the HST and attribute association based on the in- ferred parse trees. [sent-18, score-0.692]

7 We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts. [sent-19, score-0.79]

8 Thus the interest in studying the scene attributes [6, 7] has been growing. [sent-24, score-0.41]

9 A typical recent work is by Patterson and Hays [7] which identified 102 scene attributes through human perception experiments and trained 102 independent classifiers. [sent-25, score-0.384]

10 In this paper, we propose a weakly supervised method to study the scene configuration and attribute localization. [sent-27, score-0.605]

11 1, our approach begins with a collection of images with attributes in text (Fig. [sent-29, score-0.317]

12 The training images are labeled with the presence of several attributes, with the precise localization of the attributes left unknown. [sent-31, score-0.354]

13 Through a learning-by-parsing strategy, we can learn the HST/AOT model and a scene part dictionary, in which each scene part corresponds to a meaningful region in the scenes such as sky, building, road, field. [sent-42, score-0.291]

14 (b) Iterative learning process including the learning of scene configuration and attribute association, and the joint inference (text in square brackets denotes the inferred attributes). [sent-51, score-0.646]

15 The nouns are assigned to the learned scene part dictionary according to an association matrix as shown in the bottom left of Fig. [sent-54, score-0.434]

16 The association matrix measures the probability of a noun and a scene parts appearing simultaneously in the training set and it can be achieve by a non-maximum suppression. [sent-56, score-0.737]

17 Each noun has a mixture of appearance models corresponding to the adjectives, e. [sent-57, score-0.439]

18 Given an image, we jointly infer the optimal parse tree and localize the semantic attributes to the scene parts by dynamic programming (right panel in Fig. [sent-62, score-0.725]

19 Then based on the inferred parse trees, we re-estimate the HST/AOT model and attribute association matrix. [sent-64, score-0.726]

20 Thus, we integrate the parsing and attribute localization under an uniform framework. [sent-65, score-0.506]

21 We evaluate the proposed method by showing: (i) The semantic attributes are properly associated with the local scene parts. [sent-66, score-0.384]

22 (ii) Compared with traditional classification algorithms, our method achieves better attribute recognition performance. [sent-67, score-0.375]

23 (iii) We improve the precision of attribute localization against a baseline sliding window method [10]. [sent-68, score-0.569]

24 (iv) The most related work is the Hierarchical Space Tiling (HST) [17] which introduced a scene hierarchy by the And-Or Tree (AOT) and proposed a structure learning method to learn a scene part dictionary and compact HST model. [sent-75, score-0.357]

25 We extend [17] to take raw images with text as input and associate scene attributes to the learned scene part dictionary. [sent-77, score-0.577]

26 Scene attributes Beyond recognizing an individual scene category, visual attributes are demonstrated as valuable semantic cues in various problems such as generating descriptions of unfamiliar objects [6]. [sent-78, score-0.727]

27 Patterson and Hays [7] proposed an attribute based scene representation containing 102 binary attributes to describe the intra-class scene variations (e. [sent-79, score-0.889]

28 These attributes were learned and inferred at the image level without localization. [sent-87, score-0.288]

29 In contrast, we jointly parse the im- ages into spatial configurations and localize the attributes, which allows us to provide more accurate and detailed descriptions. [sent-88, score-0.26]

30 Attributes localization In learning the relationships between the attributes and specific image regions, we relate to the recent work on object detection and localization. [sent-89, score-0.378]

31 scene configurations, and (ii) HST-att which models the appearance types of the scene attributes and the correlations between the scene parts and attributes. [sent-104, score-0.711]

32 The terminal nodes VT, form a scene part dictionary Δ = VT. [sent-112, score-0.501]

33 hA c enlul imsb seeer no fa sth aen aattoommiicc elements compose the higher-level terminal nodes at different scales, locations and shapes. [sent-114, score-0.347]

34 Beyond HST-geo, we combine the scene attributes to represent both the geometry and semantics of the scenes. [sent-118, score-0.384]

35 Scene attributes come from the text descriptions of training images which contain several noun+adjective phrases. [sent-119, score-0.406]

36 The nouns correspond to the objects in the scenes and the adjectives correspond to the appearance. [sent-120, score-0.244]

37 And each terminal node in HST-geo can link to a noun and further an adjective attribute by an association matrix. [sent-123, score-1.5]

38 The HST is naturally recursive, starting from a root which is an Or-node, generating the alternating levels of And-nodes and Or-nodes, and stopping at the terminal nodes with a specific appearance type (noun+adjective). [sent-127, score-0.353]

39 The And-Or structure defines a space of possible parse trees and embodies probabilistic context free grammar (PCFG) [15]. [sent-128, score-0.253]

40 By selecting the branches at Or-nodes, a parse tree pt is derived, e. [sent-129, score-0.388]

41 2 represents two parse trees as instances of the HST. [sent-132, score-0.218]

42 When parse trees collapse to the image lattice, they produce configurations. [sent-133, score-0.218]

43 In the learning process, we maximize the likelihood subject to a model complexity and prune out the branches with zero or low probability to obtain a compact HST and the scene part dictionary. [sent-135, score-0.211]

44 VT is a set of terminal nodes forming the scene part dictionary Δ = VT. [sent-141, score-0.501]

45 sTeh ae learning requires us ttioo enst siemta Cte = =the { branching probabilities Θ and scene part dictionary Δ by maximizing a log-likelihood. [sent-225, score-0.282]

46 ∈VpTt where VpOtR, VpTt denote the Or-nodes and terminal nodes in the pt, and is the parameter to balance the two terms (λ = 0. [sent-241, score-0.322]

47 Cvk denotes the segmented patch covered by the terminal node v. [sent-243, score-0.282]

48 , λ EOR(vi|v) = −lnθ(v → vi) The energy for a terminal node is defined as ET(Ckv|v) = −ln|C1vk|i∈? [sent-247, score-0.282]

49 n Idn nl tkvh ies kth-teh d loamyeinr,a lnt label of the terminal node v. [sent-252, score-0.282]

50 The first term measures the homogeneity of the terminal nodes in terms of segmentation labels and the second term penalizes large k. [sent-253, score-0.349]

51 2, we adopt an iteratively learning-by-parsing strategy including: (i) inferring the optimal parse tree pt by dynamic programming (optimize Eq. [sent-255, score-0.354]

52 Then we collect the terminal nodes from all the parse trees to form a scene part dictionary Δ. [sent-259, score-0.719]

53 Hence, the terminal nodes are allowed to be locally adjustable to fit the scene region boundaries. [sent-262, score-0.452]

54 Learning for the HST-att The text descriptions usually contain noun+adjective phrases: The nouns indicate objects/regions inside a scene (e. [sent-267, score-0.397]

55 i Ls etht eA noun Aattr,iAbute }se dt annodte eA thaedj tistthrieb adjective hatetrriebu Ate siest . [sent-275, score-0.703]

56 t We explore the relationship between a noun a ∈ An and a scene part v ∈ th Δe r by an nasshsoipc biaetitowne emna atrni xo:u Φ : An Δ? [sent-276, score-0.538]

57 ∈An Φ where the entries of the rows in are the noun attributes and the columns are the scene parts, and we normalize each × columns to be one. [sent-281, score-0.792]

58 1, each training image has an optimal parse tree pt. [sent-284, score-0.226]

59 Because the attributes are annotated at the image level rather than the precise image regions, we initialize Φ by counting all the combinations of the nouns and the terminal nodes in pt: ? [sent-285, score-0.691]

60 1 ·[v ∈ ptm] · φm(a,v) (7) where Anm ⊆ An is the noun attribute set for an image, and φm (a, v) d⊆eno Ates its association probability initialized by φm (a, v) = 1. [sent-289, score-0.946]

61 3 is the suppression parameter; (ii) suppress the association between the selected node with other noun attributes: φm (a, v∗) = s φm (a, v∗) ; a ∈ Anm\a∗ , I ∈ I˜, a∗ v∗: v∗ ? [sent-296, score-0.632]

62 4 (left) shows the association of noun attributes and scene parts, where the horizontal axis denotes the nodes in HST-geo and the vertical axis denotes the normalized association probability. [sent-365, score-1.162]

63 For example, “sky” has highly probability with the nodes covering the top area of an image and “horse” has highly probability with the nodes covering the middle area of an image. [sent-366, score-0.226]

64 To qualitatively evaluate the association, for each noun attribute, we average the image patches assigned to it. [sent-367, score-0.408]

65 4 (right), although learning in a weakly supervised way, our association shows the similar spatial priors of the object categories with [4] (see Fig. [sent-369, score-0.232]

66 5 shows the image patches assigned to each noun are then split into multiple clusters according to the given adjectives. [sent-372, score-0.408]

67 And we train a binary SVM classifier for each noun+adjective attribute based on those image patches using color histogram feature and SIFT bag-of-words feature. [sent-373, score-0.375]

68 Joint inference and learning Take the learned HST-geo and association matrix Φ as an initialization, we infer pt+ ={pt,A} to simultaneously achieve the optimal scene configuration pt an sidm auttltraibnueoteu assignment A={An,Aadj }, then re-estimate HST-geo and Φ. [sent-375, score-0.479]

69 8) 3 Jointly infer pt+ with attribute localization (optimize Eq. [sent-383, score-0.5]

70 The second term measures the noun attribute association: En(an |v) = −ln Φ(an, v) (10) The third term is designed to model the co-occurrence of a noun and an adjective attribute Ea(aadj|an) = −lnp(aadj|an) where p(aadj|an) = ? [sent-401, score-1.861]

71 oun and an adjective and can be counted from the given text phrases. [sent-404, score-0.358]

72 9, the dynamic programming algorithm can be employed to infer the optimal parse tree with the attributes (pt+ )∗ = arg minpt+ E(pt+ , I; Θ, Δ, Φ) . [sent-411, score-0.536]

73 Moreover, some attribute types in [7], such as functions and affordances (e. [sent-561, score-0.375]

74 [6] proposed the CORE dataset including 2,800 images with segmentations and attribute annotations for vehicles and animals. [sent-570, score-0.403]

75 1 Finally, we got the attribute set An={sky, flower, mountain, ibis, horse. [sent-581, score-0.4]

76 }in,s A 17 noun uaet-, tcrliobuudteys, raoncdk y3,0 s noun+adjective awtthriibchut ceo pairs sin 1 7to ntaol. [sent-590, score-0.408]

77 For the testing set, we also ask peo- × ple to localize the attributes through bounding box Bgdth as ground truth for evaluating the part localization accuracy, as it is shown at the bottom right panel of Fig. [sent-594, score-0.408]

78 Attribute Recognition Baselines We first compare our method in attribute recognition, which evaluates the accuracy of an attribute presence in images. [sent-598, score-0.75]

79 (iii) HST-geo: To evaluate the contribution of attribute association, we also compare our method with HST-geo [17]. [sent-604, score-0.375]

80 Specifically, for a given image, we first parse it from its multi-layer segmentation and classify each terminal nodes in the parse tree by the classifiers trained in (i). [sent-605, score-0.725]

81 7 shows the average precision (AP) for classifying each attribute and the mean average precision (MAP) for the entire attribute set is reported in Table. [sent-607, score-0.812]

82 BoW+SPM shows lower performance because the lack of color feature which is a strong cue in scene attribute recognition. [sent-609, score-0.505]

83 Attribute Localization Baselines For attribute localization, we benchmark our method against a fully supervised sliding window method (SW-FS) [10]. [sent-614, score-0.466]

84 SW-FS trains an attribute classifier using ground truth bounding boxes as positive examples and random rectangles from each negative image for negative data. [sent-615, score-0.403]

85 By treating localization as localized detection, the SW-FS applies attribute classifiers subsequently to sub-images at 1http://www. [sent-616, score-0.475]

86 The detected sub-windows is ordered by the classification score and taken as indications for the presence of an attribute in this region by nonmaximum suppression with 0. [sent-624, score-0.44]

87 In addition, we also compare with HST-geo for evaluating the attribute association. [sent-626, score-0.375]

88 Without the geometric constraint, (i) Certain attributes will be confused by appearance (e. [sent-629, score-0.285]

89 8(a) shows the attributed parse trees and configurations generated from the joint inference and Fig. [sent-636, score-0.298]

90 We quantitatively evaluate the attribute localization performance by following the procedure adopted in [18]. [sent-638, score-0.475]

91 The average precisions (AP) for each attribute are shown in Fig. [sent-650, score-0.375]

92 3 shows a surprising improvement of attribute localization of our method. [sent-653, score-0.475]

93 Discussion and future work This paper presents a weakly supervised method for learning the scene configurations with attribute localizations. [sent-655, score-0.65]

94 (i) We quantize the space of scene configurations by an Hierarchical Space Tiling (HST) and utilize a learningby-parsing strategy to do parameter estimation; (ii) We discover the relationship between the scene parts and attributes Table 2. [sent-656, score-0.603]

95 The attribute localization performance (nouns and adjectives) by an association matrix; (iii) We joint infer the scene configuration and attribute localization by dynamic programming. [sent-665, score-1.304]

96 The attributes used in this paper are related to local object and regions, but there are also global attributes (style of the whole parse tree) such as aesthetics, which we are studying in ongoing work by extending our model to an attribute grammar. [sent-667, score-1.086]

97 Automatic attribute discovery and characterization from noisy web images. [sent-685, score-0.375]

98 Nonparametric scene parsing: label transfer via dense scene alignment. [sent-691, score-0.26]

99 Sun attribute database: discovering, annotating, and recognizing scene attributes. [sent-709, score-0.505]

100 (c) More attribute localization results from our method. [sent-827, score-0.475]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('noun', 0.408), ('attribute', 0.375), ('adjective', 0.295), ('hst', 0.295), ('attributes', 0.254), ('terminal', 0.232), ('aadj', 0.19), ('parse', 0.177), ('association', 0.14), ('scene', 0.13), ('pt', 0.128), ('nouns', 0.115), ('sky', 0.114), ('localization', 0.1), ('adjectives', 0.098), ('nodes', 0.09), ('descriptions', 0.089), ('anm', 0.084), ('tiling', 0.078), ('bgdth', 0.063), ('ckernel', 0.063), ('cvk', 0.063), ('vptt', 0.063), ('ck', 0.063), ('text', 0.063), ('ii', 0.058), ('cloudy', 0.056), ('vand', 0.056), ('configurations', 0.053), ('branching', 0.053), ('node', 0.05), ('tree', 0.049), ('dictionary', 0.049), ('eor', 0.047), ('outdoor', 0.044), ('iii', 0.043), ('kulkarni', 0.043), ('axlogp', 0.042), ('canyon', 0.042), ('overcast', 0.042), ('tangram', 0.042), ('vpot', 0.042), ('vpttm', 0.042), ('trees', 0.041), ('weakly', 0.04), ('sliding', 0.04), ('ucla', 0.04), ('mil', 0.038), ('axm', 0.037), ('iv', 0.037), ('bow', 0.036), ('parts', 0.036), ('grammar', 0.035), ('spm', 0.035), ('vor', 0.035), ('suppression', 0.034), ('inferred', 0.034), ('branches', 0.034), ('vi', 0.033), ('ocean', 0.033), ('lnp', 0.033), ('configuration', 0.032), ('ln', 0.032), ('scenes', 0.031), ('lik', 0.031), ('aot', 0.031), ('nonmaximum', 0.031), ('precision', 0.031), ('parsing', 0.031), ('arg', 0.031), ('appearance', 0.031), ('localize', 0.03), ('hierarchical', 0.03), ('patterson', 0.03), ('captioned', 0.03), ('hays', 0.03), ('ea', 0.029), ('ordonez', 0.029), ('acting', 0.029), ('segmentations', 0.028), ('rectangles', 0.028), ('reconfigurable', 0.028), ('supervised', 0.028), ('homogeneity', 0.027), ('joint', 0.027), ('probabilities', 0.026), ('layouts', 0.026), ('excessive', 0.026), ('blue', 0.026), ('studying', 0.026), ('vt', 0.025), ('infer', 0.025), ('got', 0.025), ('compose', 0.025), ('optimize', 0.024), ('learning', 0.024), ('panel', 0.024), ('hierarchy', 0.024), ('probability', 0.023), ('window', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu

Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

2 0.33102846 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.27170837 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

4 0.23938777 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

Author: Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry S. Davis

Abstract: We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes. Our optimization formulation discovers category specific attributes as well as the images that have high confidence in terms of the attributes. In addition, we propose a method to stably capture example-specific attributes for a small sized training set. Our method adds images to a category from a large unlabeled image pool, and leads to significant improvement in category recognition accuracy evaluated on a large-scale dataset, ImageNet.

5 0.21590045 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy

Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

6 0.20929083 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

7 0.2042508 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

8 0.20353281 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

9 0.20132656 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

10 0.18562493 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

11 0.17524554 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

12 0.15950923 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

13 0.1548934 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

14 0.15055676 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

15 0.14555568 99 cvpr-2013-Cross-View Image Geolocalization

16 0.13235041 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

17 0.13080314 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

18 0.1272772 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

19 0.11762629 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

20 0.11039451 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.195), (1, -0.137), (2, -0.038), (3, -0.043), (4, 0.136), (5, 0.116), (6, -0.244), (7, 0.143), (8, 0.084), (9, 0.243), (10, -0.046), (11, 0.142), (12, -0.072), (13, 0.014), (14, 0.088), (15, 0.04), (16, -0.027), (17, 0.023), (18, -0.028), (19, 0.06), (20, 0.024), (21, 0.066), (22, 0.041), (23, 0.014), (24, 0.043), (25, -0.002), (26, 0.043), (27, -0.038), (28, -0.061), (29, 0.085), (30, -0.073), (31, 0.042), (32, 0.07), (33, 0.003), (34, -0.035), (35, -0.041), (36, -0.028), (37, -0.003), (38, 0.004), (39, -0.055), (40, 0.024), (41, 0.05), (42, -0.014), (43, -0.091), (44, -0.046), (45, 0.018), (46, -0.046), (47, 0.022), (48, 0.069), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9510631 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu

Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

2 0.83904755 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.82910162 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

Author: Catherine Wah, Serge Belongie

Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.

4 0.82262111 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

Author: Babak Saleh, Ali Farhadi, Ahmed Elgammal

Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We also show that abnormality predictions can help image categorization. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.

5 0.81597394 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

Author: Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

Abstract: Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e.g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.

6 0.80619293 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

7 0.76651883 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

8 0.73512888 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

9 0.66298193 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

10 0.65976697 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

11 0.6487233 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

12 0.64570111 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

13 0.57384622 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling

14 0.55816239 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

15 0.54226255 99 cvpr-2013-Cross-View Image Geolocalization

16 0.47565293 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

17 0.46272194 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

18 0.45328778 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

19 0.44568667 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

20 0.44546357 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.088), (16, 0.021), (21, 0.235), (26, 0.038), (28, 0.01), (33, 0.287), (39, 0.043), (63, 0.011), (67, 0.049), (69, 0.037), (77, 0.011), (80, 0.011), (87, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89048105 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction

Author: Mao Ye, Cha Zhang, Ruigang Yang

Abstract: With the wide-spread of consumer 3D-TV technology, stereoscopic videoconferencing systems are emerging. However, the special glasses participants wear to see 3D can create distracting images. This paper presents a computational framework to reduce undesirable artifacts in the eye regions caused by these 3D glasses. More specifically, we add polarized filters to the stereo camera so that partial images of reflection can be captured. A novel Bayesian model is then developed to describe the imaging process of the eye regions including darkening and reflection, and infer the eye regions based on Classification ExpectationMaximization (EM). The recovered eye regions under the glasses are brighter and with little reflections, leading to a more nature videoconferencing experience. Qualitative evaluations and user studies are conducted to demonstrate the substantial improvement our approach can achieve.

2 0.88367009 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes

Author: Rui Li, Pengcheng Shi, Anne R. Haake

Abstract: Eliciting and representing experts ’ remarkable perceptual capability of locating, identifying and categorizing objects in images specific to their domains of expertise will benefit image understanding in terms of transferring human domain knowledge and perceptual expertise into image-based computational procedures. In this paper, we present a hierarchical probabilistic framework to summarize the stereotypical and idiosyncratic eye movement patterns shared within 11 board-certified dermatologists while they are examining and diagnosing medical images. Each inferred eye movement pattern characterizes the similar temporal and spatial properties of its corresponding seg. edu anne .haake @ rit . edu , ments of the experts ’ eye movement sequences. We further discover a subset of distinctive eye movement patterns which are commonly exhibited across multiple images. Based on the combinations of the exhibitions of these eye movement patterns, we are able to categorize the images from the perspective of experts’ viewing strategies. In each category, images share similar lesion distributions and configurations. The performance of our approach shows that modeling physicians ’ diagnostic viewing behaviors informs about medical images’ understanding to correct diagnosis.

same-paper 3 0.85301304 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu

Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

4 0.8440755 96 cvpr-2013-Correlation Filters for Object Alignment

Author: Vishnu Naresh Boddeti, Takeo Kanade, B.V.K. Vijaya Kumar

Abstract: Alignment of 3D objects from 2D images is one of the most important and well studied problems in computer vision. A typical object alignment system consists of a landmark appearance model which is used to obtain an initial shape and a shape model which refines this initial shape by correcting the initialization errors. Since errors in landmark initialization from the appearance model propagate through the shape model, it is critical to have a robust landmark appearance model. While there has been much progress in designing sophisticated and robust shape models, there has been relatively less progress in designing robust landmark detection models. In thispaper wepresent an efficient and robust landmark detection model which is designed specifically to minimize localization errors thereby leading to state-of-the-art object alignment performance. We demonstrate the efficacy and speed of the proposed approach on the challenging task of multi-view car alignment.

5 0.8382405 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?

Author: Jun Hu, Orazio Gallo, Kari Pulli, Xiaobai Sun

Abstract: We present a novel method for aligning images in an HDR (high-dynamic-range) image stack to produce a new exposure stack where all the images are aligned and appear as if they were taken simultaneously, even in the case of highly dynamic scenes. Our method produces plausible results even where the image used as a reference is either too dark or bright to allow for an accurate registration.

6 0.81684691 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition

7 0.80339533 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

8 0.8016575 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation

9 0.79705107 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

10 0.79648209 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

11 0.79621285 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

12 0.79602462 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

13 0.79456961 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

14 0.79440176 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

15 0.79405564 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

16 0.793917 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

17 0.79372644 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

18 0.79368579 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

19 0.79364842 187 cvpr-2013-Geometric Context from Videos

20 0.79334259 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation