cvpr cvpr2013 cvpr2013-85 knowledge-graph by maker-knowledge-mining

85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes


Source: pdf

Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 sg Abstract Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. [sent-11, score-1.238]

2 Many works have exploited attributes at image level for various applications. [sent-12, score-0.498]

3 However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. [sent-13, score-1.405]

4 Hence, we propose to leverage attributes at video level (named as video attributes in this work), i. [sent-14, score-1.234]

5 , the semantic labels of external videos are used as attributes. [sent-16, score-0.301]

6 Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. [sent-17, score-1.045]

7 Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. [sent-18, score-2.421]

8 Introduction In this paper, we focus on the event detection of largescale real-world videos [2, 3]. [sent-21, score-0.68]

9 In the past, detection of events that are simple, well-defined and describable by a short video sequence, e. [sent-23, score-0.385]

10 In the real world, however, users are more interested in videos depicting complex events such as celebrating the New Year. [sent-26, score-0.449]

11 Complex event detection is very challenging as these events usually contain many people and/or objects, various human actions, multiple scenes; have significant c ovevmide pnlotex s parkour tection with video attributes. [sent-27, score-0.895]

12 intra-class variations; and take place in much longer video clips [2, 3, 15, 16]. [sent-28, score-0.139]

13 Despite the arduousness, the practical significance of complex event detection has drawn increasing interest from researchers [23, 10, 15, 16]. [sent-29, score-0.641]

14 have introduced the first exploration of Ad Hoc multimedia event detection when there are only 10 positive examples for training [15]. [sent-31, score-0.602]

15 As complex events usually contain visual attributes related to people, scenes, objects and human actions (e. [sent-33, score-0.799]

16 , Figure 1 shows that a complex event parkour is relevant withpush ups, building, etc. [sent-35, score-0.668]

17 ), leveraging these attributes properly could be helpful for the detection. [sent-36, score-0.532]

18 Visual attributes were introduced as describable proper- ties of an object and have been applied to many applications [5, 7, 9]. [sent-37, score-0.55]

19 Visual attributes can be either at local level 222666222755 or global level. [sent-38, score-0.498]

20 For example, “many people ” is a locallevel attribute for the event flash mob gathering. [sent-39, score-0.786]

21 Yet this kind of attributes is mostly defined manually, which is timeconsuming and requires expertise. [sent-40, score-0.475]

22 Instead, we can use attributes at global level which are the semantic labels of images [19]. [sent-41, score-0.582]

23 For instance, an image with its semantic label tennis can be leveraged for understanding the event playing tennis. [sent-42, score-0.572]

24 Given that this type of attributes is associated with images, we regard it as image attributes. [sent-43, score-0.475]

25 Using image attributes for complex event detection is intuitively limited as image attributes usually cannot characterize the dynamic properties of complex event videos (complex event videos refer to the videos depicting complex events). [sent-44, score-3.322]

26 In this paper, we therefore propose an idea of video attributes and particularly apply it for complex event detection. [sent-45, score-1.196]

27 Video attributes, in our work, indicate the semantic labels of other external videos collected by researchers. [sent-46, score-0.301]

28 Note that these external videos are different from complex event videos. [sent-47, score-0.823]

29 Compared to complex event videos, the external videos contain simple contents of people, objects, scenes and actions which are basic elements of complex events. [sent-48, score-1.045]

30 For example, a video with its semantic label mixing batter is useful for understanding the complex event making a cake. [sent-49, score-0.773]

31 As the external videos are used by treating their semantic labels as video attributes, we call these videos attribute videos. [sent-50, score-0.748]

32 To use video attributes, we may refer to a typical approach that involves training attribute classifiers and then using their outputs as intermediate representations for the complex event videos [8, 11]. [sent-51, score-1.107]

33 First, when the number of attributes used is limited, it is insufficient to learn a discriminative intermediate representation. [sent-53, score-0.529]

34 Second, given a particular event to detect, only some attributes are discriminative while others are comparatively useless or even noisy [16]. [sent-54, score-0.969]

35 It is difficult to decide what attributes to use for different events. [sent-55, score-0.475]

36 In contrast, we propose to use video attributes as additional information to assist complex event detection. [sent-56, score-1.196]

37 Specifically, our framework learns the attribute classifier and event detector simultaneously. [sent-57, score-0.724]

38 The observation of a particular event affects the attribute classifier, and in return, attributes characterize the event. [sent-58, score-1.19]

39 This kind of mutual influence is explored by a correlation vector, which helps incorporate extra informative cues into the event detector. [sent-59, score-0.54]

40 Our approach has two merits: the learning process of event detector is not solely dependent on the video attributes; and the joint framework adapts the knowledge from attributes for different events, i. [sent-61, score-1.109]

41 , a particular event obtains dedicated perks via the joint learning of attribute classifier and event detector. [sent-63, score-1.193]

42 Moreover, we propose to integrate multiple features from both complex event videos and attribute videos for learning the detector as combining multiple features has proved to be beneficial for visual analysis [22]. [sent-64, score-1.138]

43 On the other hand, existing video collections have different themes. [sent-65, score-0.152]

44 As we expect the video attributes to be diverse, in our framework video attributes from different collections are utilized. [sent-66, score-1.217]

45 The main contributions ofthis paper are as follows: First, we propose using video attributes for complex event detection. [sent-68, score-1.196]

46 Second, video attributes are used latently as additional information for learning the event detector. [sent-69, score-1.125]

47 Third, multiple attribute video sets with different features are sewed seamlessly with multiple features from the complex event videos. [sent-70, score-0.975]

48 Related Work Visual attributes were advocated as the describable properties of objects [8]. [sent-72, score-0.55]

49 For example, an object bear can be described by attributes such as furry and four legs. [sent-73, score-0.475]

50 The attributes of an object are treated as latent variables and the correlations among attributes are used to classify object classes. [sent-77, score-0.95]

51 A method to learn visual attributes and object classes together has been presented in [20]. [sent-78, score-0.475]

52 have presented an interactive approach which discovers local attributes that are both discriminative and semantically meaningful for fine-grained category classification [7]. [sent-80, score-0.475]

53 have proposed to explore the shared features between objects and their attributes for animal and scene classification [9]. [sent-82, score-0.475]

54 have leveraged high level describable attributes for selecting high aesthetic quality images and interesting ones from large image collections [5]. [sent-84, score-0.636]

55 However, to generate local attributes usually requires a manually defining process which is burdensome. [sent-85, score-0.475]

56 An alternative way is to leverage attributes at a global level, i. [sent-86, score-0.506]

57 In the past, global-level image attributes have been widely used [19, 14]. [sent-91, score-0.475]

58 have presented an object classification method by casting prior features obtained from global image attributes of auxiliary images into their multiple kernel learning framework [14]. [sent-93, score-0.475]

59 For recognition or detection tasks in videos, image attributes probably cannot well characterize the dynamic properties which could hamper their contributions. [sent-94, score-0.55]

60 have proposed learning an intermediate representation for event detection. [sent-96, score-0.548]

61 In their approach, the intermediate representation is the same for the event videos and the attribute videos. [sent-97, score-0.88]

62 However, it could be natural to assume that the events and attributes are different depictions of videos at different levels. [sent-98, score-0.786]

63 Differently, we leverage 222666222866 video attributes to characterize complex events, which is interpretable. [sent-100, score-0.773]

64 In our framework, we also learn an attribute classifier which can be used to predict the attributes of a given video. [sent-101, score-0.68]

65 Since the attribute classifier is jointly optimized with the event detector, the related attribute classifier is more accurate in uncovering the attributes from an event video. [sent-102, score-1.873]

66 For example, by exploiting the videos of “landing a fish”, the concept classifier “fish” can be more accurately trained and vice versa. [sent-103, score-0.175]

67 In addition, as a byproduct of our method, the attribute representation can be further used for other applications such as multimedia event recounting [6]. [sent-104, score-0.789]

68 Video Attributes Assisted Event Detection We first correlate the features of attribute videos from m multiple sources with their semantic labels respectively. [sent-106, score-0.444]

69 Next we illustrate how t∈o lRearn a detector for the complex event by incorporating the attribute videos. [sent-128, score-0.812]

70 Similarly we first map the multiple features of the complex event vid? [sent-129, score-0.606]

71 im=1 ∈ Rdi×n, where n is the number of complex event v? [sent-133, score-0.606]

72 Since the attribute videos and the complex event videos are relevant, i. [sent-164, score-1.089]

73 , complex events are usually related to people, scenes, objects and human actions, the two domains would have some shared knowledge. [sent-166, score-0.272]

74 After Qi, wi and fi are obtained, we fix them and optimize pi. [sent-258, score-0.163]

75 Output: Optimized wi ∈ Rdi×1, Qi ∈ Rdi×ci, pi ∈ Rci×1 and fi ∈ Rn×1∈. [sent-266, score-0.207]

76 Experiments In this section we present the experiments that evaluate the proposed method for complex event detection. [sent-270, score-0.606]

77 Datasets The TRECVID MED 2012 development set (MED12) is used for complex event detection. [sent-273, score-0.606]

78 , the UCF50 dataset [17] and the development set from TRECVID 2012 semantic indexing task are used as attribute videos. [sent-277, score-0.233]

79 The video set for TRECVID 2012 semantic indexing (SIN) task covers 346 concepts. [sent-279, score-0.167]

80 We extract STIP [12] and SIFT [13] descriptors for the videos of MED12, STIP for UCF50 and SIFT for SIN12. [sent-283, score-0.151]

81 (2) Baseline: We set β in Eq (5) to 0 so that no video attributes are exploited in our approach. [sent-289, score-0.59]

82 (3) SVM: SVM is an effective tool for complex event detection and has been widely used by several research groups for TRECVID MED, e. [sent-291, score-0.641]

83 (4) Attributes Intermediate Representation (AIR): We train attribute classifiers using UCF50 and SIN12. [sent-295, score-0.181]

84 SVM is applied on the new representations afterwards for event detection. [sent-297, score-0.494]

85 For instance, MCR is 10%-75% better than SVM for 16 events in terms of AP. [sent-328, score-0.16]

86 The promising performance of MCR verifies that leveraging video attributes properly is beneficial for complex event detection. [sent-329, score-1.277]

87 Results using Single Feature and Single Source In this part, we only use UCF50+MED12 with STIP feature and SIN12+MED12 with SIFT feature for complex event detection to show the performance change. [sent-332, score-0.641]

88 This experiment validates that exploiting multiple attribute video sets together with different features is beneficial for most cases. [sent-340, score-0.32]

89 Conclusions We have proposed a method for utilizing the attributes at video level for complex event detection. [sent-342, score-1.219]

90 Video attributes are convenient to use for complex event detection as many video collections relevant to people, scenes, objects and actions are available. [sent-343, score-1.32]

91 Meanwhile, video attributes have more potentials than image attributes to characterize the dynamic properties of video data. [sent-344, score-1.22]

92 Unlike the traditional approach which maps the video data into attribute space, our method learns a correlation vector which correlates video attributes and a complex event. [sent-345, score-1.052]

93 Built upon this, the extra informative cues learnt from attribute videos are further incorporated into the event detector. [sent-346, score-0.85]

94 We have performed extensive experiments using a real-world large-scale video dataset to evaluate the efficacy of our method on complex event detection. [sent-347, score-0.721]

95 The results are encouraging and have verified the advantage of leveraging video attributes properly. [sent-348, score-0.622]

96 High level describable attributes for predicting aesthetics and interestingness. [sent-381, score-0.573]

97 Recognizing complex events using large marginjoint low-level event model. [sent-424, score-0.766]

98 Learning to detect unseen object classes by between-class attribute transfer. [sent-431, score-0.181]

99 Knowledge adaptation for ad hoc multimedia event detection with few exemplars. [sent-456, score-0.634]

100 Informedia e-lamp @ TRECVID2012: Multimedia event detection and recounting med and mer. [sent-533, score-0.641]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('event', 0.494), ('attributes', 0.475), ('mcr', 0.23), ('attribute', 0.181), ('events', 0.16), ('videos', 0.151), ('rdi', 0.145), ('ix', 0.137), ('itqi', 0.124), ('qipi', 0.117), ('video', 0.115), ('eq', 0.115), ('complex', 0.112), ('itbi', 0.103), ('trecvid', 0.102), ('stip', 0.097), ('wi', 0.096), ('itqipi', 0.093), ('iypit', 0.093), ('minndc', 0.093), ('qitx', 0.093), ('qi', 0.084), ('pipit', 0.083), ('pmd', 0.083), ('describable', 0.075), ('multimedia', 0.073), ('med', 0.071), ('itwi', 0.07), ('fi', 0.067), ('sebe', 0.066), ('external', 0.066), ('parkour', 0.062), ('rci', 0.062), ('intermediate', 0.054), ('semantic', 0.052), ('actions', 0.052), ('iv', 0.05), ('singapore', 0.049), ('itqipipit', 0.047), ('metze', 0.047), ('mqiin', 0.047), ('rawat', 0.047), ('schulam', 0.047), ('sewed', 0.047), ('zhongwen', 0.047), ('pi', 0.044), ('ma', 0.043), ('mob', 0.041), ('recounting', 0.041), ('flash', 0.041), ('latently', 0.041), ('characterize', 0.04), ('appliance', 0.038), ('iai', 0.038), ('psc', 0.038), ('im', 0.038), ('ai', 0.037), ('collections', 0.037), ('svm', 0.036), ('detection', 0.035), ('scenes', 0.035), ('ap', 0.034), ('derivative', 0.034), ('burger', 0.033), ('sift', 0.033), ('leveraging', 0.032), ('labels', 0.032), ('vehicle', 0.032), ('hwang', 0.032), ('dhar', 0.032), ('correlates', 0.032), ('hoc', 0.032), ('leverage', 0.031), ('ci', 0.03), ('pages', 0.03), ('fish', 0.03), ('rni', 0.029), ('iy', 0.029), ('people', 0.029), ('iarpa', 0.028), ('correlate', 0.028), ('nist', 0.027), ('hilbert', 0.027), ('rn', 0.026), ('seamlessly', 0.026), ('depicting', 0.026), ('leveraged', 0.026), ('detector', 0.025), ('properly', 0.025), ('classifier', 0.024), ('official', 0.024), ('clips', 0.024), ('informative', 0.024), ('beneficial', 0.024), ('cai', 0.024), ('duan', 0.024), ('level', 0.023), ('contents', 0.023), ('ding', 0.023), ('correlation', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999869 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.

2 0.45530802 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.29548645 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

Author: Suha Kwak, Bohyung Han, Joon Hee Han

Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.

4 0.28291011 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding

Author: Jérôme Revaud, Matthijs Douze, Cordelia Schmid, Hervé Jégou

Abstract: This paper presents an approach for large-scale event retrieval. Given a video clip of a specific event, e.g., the wedding of Prince William and Kate Middleton, the goal is to retrieve other videos representing the same event from a dataset of over 100k videos. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain. Our method outperforms the state of the art both in search quality and query time on two large-scale video benchmarks for copy detection, TRECVID and CCWEB. Finally, we introduce a challenging dataset for event retrieval, EVVE, and report the performance on this dataset.

5 0.2725834 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

6 0.26881558 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

7 0.24130121 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

8 0.20929083 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

9 0.20315754 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

10 0.20250143 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

11 0.19814441 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

12 0.18568018 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

13 0.16099717 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

14 0.16086416 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

15 0.1594779 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

16 0.15826614 402 cvpr-2013-Social Role Discovery in Human Events

17 0.15285683 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

18 0.152155 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

19 0.15029818 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources

20 0.14700457 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, -0.173), (2, -0.058), (3, -0.114), (4, 0.02), (5, 0.129), (6, -0.381), (7, 0.035), (8, 0.072), (9, 0.279), (10, 0.008), (11, 0.03), (12, -0.016), (13, -0.029), (14, 0.046), (15, 0.016), (16, -0.013), (17, 0.021), (18, -0.023), (19, -0.046), (20, -0.111), (21, 0.005), (22, -0.025), (23, -0.077), (24, -0.034), (25, -0.022), (26, -0.038), (27, -0.054), (28, 0.012), (29, 0.012), (30, 0.118), (31, -0.028), (32, 0.032), (33, -0.129), (34, -0.071), (35, 0.095), (36, 0.048), (37, 0.073), (38, 0.067), (39, 0.082), (40, 0.122), (41, -0.038), (42, -0.059), (43, -0.08), (44, 0.07), (45, 0.05), (46, -0.119), (47, 0.041), (48, 0.028), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97203457 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.

2 0.76323205 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

3 0.73020405 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen

Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.

4 0.70950824 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning

Author: Babak Saleh, Ali Farhadi, Ahmed Elgammal

Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We also show that abnormality predictions can help image categorization. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.

5 0.70412135 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

Author: Weixin Li, Qian Yu, Harpreet Sawhney, Nuno Vasconcelos

Abstract: In this work, we propose a novel video representation for activity recognition that models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized by the dynamics of their attributes. These segments are modeled by a dictionary of attribute dynamics templates, which are implemented by a recently introduced generative model, the binary dynamic system (BDS). We propose methods for learning a dictionary of BDSs from a training corpus, and for quantizing attribute sequences extracted from videos into these BDS codewords. This procedure produces a representation of the video as a histogram of BDS codewords, which is denoted the bag-of-words for attribute dynamics (BoWAD). An extensive experimental evaluation reveals that this representation outperforms other state-of-the-art approaches in temporal structure modeling for complex ac- tivity recognition.

6 0.70086747 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop

7 0.66572183 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

8 0.65928179 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

9 0.64683527 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

10 0.59999484 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

11 0.59214103 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

12 0.57326353 402 cvpr-2013-Social Role Discovery in Human Events

13 0.56015259 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding

14 0.53407115 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation

15 0.5224185 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

16 0.4926258 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

17 0.46062016 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

18 0.44598651 28 cvpr-2013-A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

19 0.42977667 243 cvpr-2013-Large-Scale Video Summarization Using Web-Image Priors

20 0.41819382 463 cvpr-2013-What's in a Name? First Names as Facial Attributes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.09), (16, 0.045), (26, 0.035), (28, 0.015), (33, 0.276), (67, 0.052), (69, 0.047), (72, 0.015), (77, 0.063), (78, 0.218), (80, 0.022), (87, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86074644 44 cvpr-2013-Area Preserving Brain Mapping

Author: Zhengyu Su, Wei Zeng, Rui Shi, Yalin Wang, Jian Sun, Xianfeng Gu

Abstract: Brain mapping transforms the brain cortical surface to canonical planar domains, which plays a fundamental role in morphological study. Most existing brain mapping methods are based on angle preserving maps, which may introduce large area distortions. This work proposes an area preserving brain mapping method based on MongeBrenier theory. The brain mapping is intrinsic to the Riemannian metric, unique, and diffeomorphic. The computation is equivalent to convex energy minimization and power Voronoi diagram construction. Comparing to the existing approaches based on Monge-Kantorovich theory, the proposed one greatly reduces the complexity (from n2 unknowns to n ), and improves the simplicity and efficiency. Experimental results on caudate nucleus surface mapping and cortical surface mapping demonstrate the efficacy and efficiency of the proposed method. Conventional methods for caudate nucleus surface mapping may suffer from numerical instability; in contrast, current method produces diffeomorpic mappings stably. In the study of cortical sur- face classification for recognition of Alzheimer’s Disease, the proposed method outperforms some other morphometry features.

same-paper 2 0.85869086 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann

Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.

3 0.82095253 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

Author: Siddhartha Chandra, Shailesh Kumar, C.V. Jawahar

Abstract: Understanding the nature of data is the key to building good representations. In domains such as natural images, the data comes from very complex distributions which are hard to capture. Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. Feature learning can thus be seen as a form of dimensionality reduction. In this paper, we describe a feature learning scheme for natural images. We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. Projections of the image patches into these subspaces gives us features, which we use to build image representations. Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

4 0.80777103 364 cvpr-2013-Robust Object Co-detection

Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang

Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.

5 0.80710351 402 cvpr-2013-Social Role Discovery in Human Events

Author: Vignesh Ramanathan, Bangpeng Yao, Li Fei-Fei

Abstract: We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.

6 0.80696303 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

7 0.80646414 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding

8 0.80619586 18 cvpr-2013-A Max-Margin Riffled Independence Model for Image Tag Ranking

9 0.80466294 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

10 0.79812455 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

11 0.79480052 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection

12 0.79450846 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition

13 0.79436928 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging

14 0.79400194 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

15 0.79393798 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

16 0.79371947 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs

17 0.7936132 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

18 0.79324156 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

19 0.79289114 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

20 0.79271102 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs