iccv iccv2013 iccv2013-205 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuanlu Xu, Liang Lin, Wei-Shi Zheng, Xiaobai Liu
Abstract: This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e.g. different poses/views, illumination) and high false positives in matching caused by conjunctions, occlusions or surrounding clutters. Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. This algorithm is designed within a candidacy graph whose vertices are matching candidates (i.e. a pair of source and target body parts), and iterates in two steps for convergence. (i) It generates possible partial matches based on compatible and competitive relations among body parts. (ii) It con- firms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods.
Reference: text
sentIndex sentText sentNum sentScore
1 inl Abstract This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. [sent-2, score-0.459]
2 Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e. [sent-3, score-0.661]
3 Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. [sent-6, score-1.154]
4 This algorithm is designed within a candidacy graph whose vertices are matching candidates (i. [sent-7, score-0.47]
5 a pair of source and target body parts), and iterates in two steps for convergence. [sent-9, score-0.296]
6 (i) It generates possible partial matches based on compatible and competitive relations among body parts. [sent-10, score-0.459]
7 (ii) It con- firms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. [sent-11, score-0.24]
8 There are large variations for human body in appearance, (e. [sent-16, score-0.242]
9 edu vidual is represented as a compositional part-based template, and part proposals are extracted from multiple instances at each parts. [sent-27, score-0.585]
10 Human re-identification is thus posed as compositional template matching. [sent-28, score-0.55]
11 tractable to construct a template of the individual to be recognized by extracting only low-level image features. [sent-30, score-0.329]
12 Given the template, re-identifying targets with the global body information often suffers from high matching false positives, as the targets are possibly occluded or conjuncted with others and backgrounds in realistic surveillance applications. [sent-32, score-0.602]
13 Furthermore, it is desired to accurately localize human body parts in general. [sent-33, score-0.354]
14 The objective of human re-identification in this work is to recognize an individual by employing body information to address the above difficulties. [sent-34, score-0.296]
15 We study the problem with the following setting based on the application requirements in surveillance: (1) The clothing of individuals remain unchanged across different scenarios. [sent-35, score-0.138]
16 Our approach builds a compos≥itio 1n2a0l part-based template Otou represent hth beu target ic nodmivpido-ual and matches the template with input images by employing a stochastic cluster sampling algorithm, as illustrated in Fig. [sent-39, score-0.801]
17 We organize the template of a query individual with an expressive tree representation that can be produced in a very simple way. [sent-41, score-0.375]
18 We perform the human body part detectors [1, 2] on several reference images of the individual, and the images of detected parts are grouped according to their semantics. [sent-42, score-0.563]
19 That is, a human template is decomposed into body parts, e. [sent-43, score-0.519]
20 This expressive template fully exploit information from multiple reference images to capture well appearance variability, partially motivated by the recently proposed hierarchical and part-based models in object recognition [23, 18, 16]. [sent-47, score-0.465]
21 Specifically, several possible instances (namely proposals), extracted from different references, exist at each part in the template, and we regard this representation as the multiple-instance-based compositional template (MICT). [sent-48, score-0.62]
22 As a result, new appearance configurations can be composed by the part proposals in the MICT. [sent-49, score-0.284]
23 We argue that the critical concern is accurately identifying the target in realistic scenarios, e. [sent-51, score-0.144]
24 In the inference stage, the body part detectors are initially utilized to generate possible part locations in the scene shot, and human re-identification is then posed as the task of part-based template matching. [sent-54, score-0.721]
25 Unlike traditional matching problems, the multiple part proposals in the MICT make the search space of matching combinatorially large, as the part proposals need to be activated alone with the matching process. [sent-55, score-1.05]
26 Handling the false alarms and misdetections by the part detectors is also a non-trivial issue during matching. [sent-56, score-0.156]
27 Inspired by recent studies in cluster sampling [6, 17, 22], we propose a stochastic algorithm to solve the compositional template matching. [sent-57, score-0.669]
28 The matching algorithm is designed based upon the candidacy graph, where each vertex denotes a pair of matching part proposals, and each edge link represents the contextual interaction (i. [sent-58, score-0.589]
29 the compatible or the competitive re- lation) between two matching pairs. [sent-60, score-0.24]
30 Compatible relations encourage vertices to activate together, while competitive relations depress conflicting vertices being activated at the same time. [sent-61, score-0.534]
31 Specifically, two vertices are encouraged to be activated together, as they are kinematically or symmetrically related, whereas two vertices are constrained that only one of them can be activated, as they belong to the same part type or overlap. [sent-62, score-0.376]
32 The algorithm iterates in two steps for optimal matching solution searching. [sent-63, score-0.142]
33 (ii) It activates clusters to confirm partial matches, leading to a new matching solution that will be accepted by the Markov Chain Monte Carlo (MCMC) mechanism [6]. [sent-65, score-0.234]
34 Note that body parts are allowed to be unmatched to cope with occlusions. [sent-66, score-0.229]
35 First, we propose a novel formulation to solve human reidentification by matching the composite template with cluster sampling. [sent-68, score-0.807]
36 Related Work In literature, previous works of human re-identification mainly focus on constructing and selecting distinctive and stable human representation, and they can be roughly divided into the following two categories. [sent-71, score-0.148]
37 Global-based methods define a global appearance human signature with rich image features and match given reference images with the observations [14, 24, 8]. [sent-72, score-0.258]
38 Recently, advanced learning techniques are employed for more reliable matching metrics [26], more representative features [19], and more expressive multivalued mapping function [3]. [sent-77, score-0.188]
39 They first localize salient body parts, and then search for part-to-part correspondence between reference samples and observations. [sent-80, score-0.363]
40 [12] adopt a decomposable triangulated graph to represent person configuration, and the pictorial structures model for human re-identification is introduced [7]. [sent-84, score-0.398]
41 Besides, modeling contextual correlation between body parts is discussed in [5]. [sent-85, score-0.285]
42 Many works [12, 8, 7] utilize multiple reference instances for individual, i. [sent-86, score-0.18]
43 multi-shot approaches, but they omit occlusions and conjunctions in the target images and re-identify the target by computing a one-to-many distance, while we explicitly handle these problems by exploiting re- configurable compositions and contextual interactions during inference. [sent-88, score-0.405]
44 Representation In this section, we first introduce the definition of multiple-instance-based compositional template, and then present the problem formulation of human re-identification. [sent-93, score-0.339]
45 Compositional Template In this work, we present a compositional template to model human with huge variations. [sent-96, score-0.58]
46 A human body is decomposed into N = 6 parts: head, torso, upper arms, forearms, thighs and calfs, and each limb is further decomposed into two symmetrical parts (i. [sent-97, score-0.443]
47 Each part g is modeled as a rectangle and indicated by a 5-tuple (t, x, y, θ, s), where t denotes the part type, x and y the part center coordinates, θ the part orientation, s the part relative scale, as widely employed in pictorial structures model [9, 1]. [sent-101, score-0.562]
48 The multipleinstance-based compositional template (MICT) T is defined as T = {Ti : Ti = {g} }iN=1, (1) where g denotes a part proposal and Ti the set of proposals for the ith part in template. [sent-102, score-1.006]
49 Given reference images of an individual, the MICT is constructed as follows. [sent-103, score-0.144]
50 We first employ body part detectors to scan every reference image and obtain detection scores for all body parts. [sent-104, score-0.596]
51 Given detection scores, we further prune impossible part configurations by several strategies: (i) For all parts, the firing detection is pruned if the overlap rate of foreground mask (done by background subtraction) is less than 75%. [sent-106, score-0.174]
52 (ii) The reference image is segmented into 4 horizonal strips with equal height. [sent-107, score-0.144]
53 Head is detected in the first strip (the first to fourth top to bottom), parts of upper body (i. [sent-108, score-0.229]
54 torso, upper arms and forearms) in the second, and parts of lower body (i. [sent-110, score-0.278]
55 Finally, we apply non-maximum suppression and collect the K proposals with highest responses for each part from all reference images. [sent-113, score-0.428]
56 Given target images (scene shots) to be matched, we can obtain the target proposal set G by a similar process as constructing the MICT, except the firing detection being pruned only by the foreground mask. [sent-114, score-0.422]
57 Considering realistic complexities in surveillance, there probably exist large numbers of detection false alarms in the target proposal set G. [sent-115, score-0.322]
58 Candidacy Graph Given the template T and the target proposal set G, the problem of human re-identification can be posed as the task of part-based template matching and solved by two steps: (i) activating one proposal for each part in T, (ii) finding the match in G. [sent-118, score-1.196]
59 (a) Kinematics (navy blue edges) and symmetry (brown edges) relations within the compositional template. [sent-121, score-0.375]
60 (b) An example to show how target part proposals are coupled together by kinematics and symmetry relations. [sent-122, score-0.488]
61 We define the set of activated part proposals Ψ from T, each of which corresponds to a certain part: Ψ = { Ψi : l(Ψi) = 1, Ψi ∈ Ti }iN=1. [sent-123, score-0.442]
62 (2) The binary label l(·) indicates whether the proposal is activTahteed b or rreym laabinesl lin(·a)c itnivdaitceadte, si. [sent-124, score-0.138]
63 o l(f· )m a=tch 1e fdo part proposals flr(·o)m = =G 0 can b ien adcetfiivnaetded as Φ = { Φi : l(Φi) = 1, Φi ∈ Gi ∪ {∅} }iN=1, (3) where Φi maps the activated proposal of the ith part in T to a proposal in G. [sent-129, score-0.796]
64 To solve these two steps simultaneously, we propose a candidacy graph representation and further formulate the problem by graph labeling. [sent-136, score-0.345]
65 We define the candidacy graph G = and the current matching state M, we first separate graph edges E into two sets: set of inconsistent edges {e ∈ E+ : l(ci) l(cj)} ∪ {e ∈ E− : l(ci) = l(cj)} (eid. [sent-137, score-0.641]
66 Afterwards, we regard candidates connected by positive edges as a cluster Cl and collect clusters connected by negative edges to generate a composite cluster Vcc. [sent-142, score-0.514]
67 In this step, we randomly choose a cluster from the obtained composite cluster Vcc and flip the labels of the selected cluster and its conflicting clusters (i. [sent-144, score-0.582]
68 the clusters connected with the selected cluster), which generates a new state M? [sent-146, score-0.126]
69 To find a better state and achieve a reversible transition between two states M and M? [sent-148, score-0.147]
70 , the acceptance rate of the transition from state M to state M? [sent-149, score-0.217]
71 Following instructions in [6], the state transition probability ratio is computed by qq((MM? [sent-159, score-0.147]
72 We show an example of one transition in composite cluster sampling in Fig. [sent-176, score-0.303]
73 In state A, Cl1 is activated and the conflicting lcluster} . [sent-179, score-0.3]
74 The transition from state A to state B achieves a fast jump between two kinds of par- tial coupling matches and coincides with an individual-toindividual comparison in re-identification. [sent-181, score-0.279]
75 For evaluating our method, we extract individuals from the original videos and annotate each of them with ID and location (bounding box). [sent-198, score-0.138]
76 In total, there are 70 reference images for 30 different individuals, (normalized to 175 pixels in height), and 80 shots in 360 288, which contain 294 targets to be re-identified. [sent-199, score-0.348]
77 There are 370 reference images normalized to 175 pixels in height, for 74 individuals, with IDs and locations provided. [sent-202, score-0.144]
78 We present 214 shots containing 1519 targets for evaluating methods, and the targets often appear with diverse poses/views, conjunctions and occlusions, see Fig. [sent-203, score-0.404]
79 For EPFL and CAMPUSHuman dataset, we randomly select reference images for each individual, and all target images are tested to match. [sent-208, score-0.238]
80 Our approach is evaluated under cases of both single reference image (single-shot, SvsS) and multiple reference images (multi-shot, MvsS, M = 2, 3). [sent-210, score-0.288]
81 We construct the MICT for each individual with their selected reference images. [sent-212, score-0.232]
82 In the re-identification, a number K of body part proposals are generated. [sent-213, score-0.452]
83 In practice, we set K approximately 3 times the number of individuals in the shot. [sent-214, score-0.138]
84 The time cost is related swaimthp tlhineg complexity ostfs st 2hes candidacy graph. [sent-218, score-0.239]
85 The results are evaluated by two ways: (i) re-identifying individuals in segmented images, i. [sent-279, score-0.138]
86 targets already localized, and (ii) re-identifying individuals from scene shots without provided segmentations. [sent-281, score-0.342]
87 The curve reflects the overall ranked matching rates; precisely, a rank r matching rate indicates the percentage of correct matches found in top r ranks. [sent-283, score-0.278]
88 We observe that the performance of reidentification can be improved significantly by fully exploiting reconfigurable compositions and contextual interactions in inference. [sent-287, score-0.337]
89 The second test is stricter, since the algorithms should also localize the target during re-identification. [sent-289, score-0.145]
90 Matching rate ofre-identifying targets in scene shots without provided segmentations. [sent-291, score-0.204]
91 We compare our method with PS [1], VPS [2], which can localize the body at the same time as localizing the parts. [sent-293, score-0.219]
92 From the results, existing methods perform poor when individuals are not well segmented and scaled to uniform size. [sent-297, score-0.138]
93 In contrast, our method can re-identify challenging target individuals by searching and matching their salient parts and thus achieves better performance. [sent-298, score-0.401]
94 8(right) confirms that both kinematics and symmetry constraints help construct better matching solution. [sent-306, score-0.252]
95 Conclusion This paper studies a novel compositional template for human re-identification, in the form of an expressive multiple-instance-based compositional representation ofthe query individual. [sent-366, score-0.925]
96 By exploiting reconfigurable compositions and contextual interactions during inference, our method handles well challenges in human re-identification. [sent-367, score-0.253]
97 Green boundings denote the target groundtruth location, while red boundings are generated by algorithm. [sent-371, score-0.23]
98 Moreover, we will explore more robust and flexible part representations and better inter-part relations in future works. [sent-372, score-0.14]
99 Person re-identification using spatial covariance regions of human body parts. [sent-407, score-0.242]
100 A stochastic graph grammar for compositional object rrepresentation and recognition. [sent-503, score-0.367]
wordName wordTfidf (topN-words)
[('compositional', 0.265), ('template', 0.241), ('candidacy', 0.239), ('mict', 0.239), ('proposals', 0.206), ('epfl', 0.176), ('body', 0.168), ('vps', 0.158), ('reidentification', 0.158), ('activated', 0.158), ('vcc', 0.151), ('reference', 0.144), ('individuals', 0.138), ('proposal', 0.138), ('pictorial', 0.126), ('cluster', 0.114), ('composite', 0.112), ('targets', 0.109), ('matching', 0.108), ('viper', 0.101), ('shots', 0.095), ('target', 0.094), ('compatible', 0.092), ('conjunctions', 0.091), ('ps', 0.086), ('expressive', 0.08), ('cps', 0.079), ('part', 0.078), ('transition', 0.077), ('human', 0.074), ('conflicting', 0.072), ('state', 0.07), ('mm', 0.07), ('vertices', 0.07), ('compositions', 0.07), ('bak', 0.068), ('boundings', 0.068), ('calfs', 0.068), ('campushuman', 0.068), ('porway', 0.068), ('rosenbluth', 0.068), ('thighs', 0.068), ('person', 0.065), ('kinematics', 0.062), ('matches', 0.062), ('relations', 0.062), ('parts', 0.061), ('bremond', 0.061), ('corvee', 0.061), ('gheissari', 0.061), ('edges', 0.059), ('surveillance', 0.058), ('firing', 0.056), ('forearms', 0.056), ('contextual', 0.056), ('clusters', 0.056), ('individual', 0.054), ('ee', 0.053), ('reconfigurable', 0.053), ('bazzani', 0.053), ('graph', 0.053), ('localize', 0.051), ('torso', 0.05), ('realistic', 0.05), ('arms', 0.049), ('stochastic', 0.049), ('symmetry', 0.048), ('cmc', 0.048), ('avss', 0.047), ('ii', 0.046), ('structures', 0.046), ('custom', 0.045), ('posed', 0.044), ('workshops', 0.044), ('iou', 0.044), ('qq', 0.044), ('sebastian', 0.044), ('mcmc', 0.041), ('guangdong', 0.041), ('match', 0.04), ('accumulation', 0.04), ('pruned', 0.04), ('alarms', 0.04), ('competitive', 0.04), ('people', 0.039), ('detectors', 0.038), ('localized', 0.037), ('decomposed', 0.036), ('instances', 0.036), ('accepted', 0.035), ('monte', 0.035), ('height', 0.035), ('partial', 0.035), ('program', 0.035), ('ensemble', 0.034), ('iterates', 0.034), ('carlo', 0.034), ('andriluka', 0.034), ('construct', 0.034), ('adopt', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
Author: Yuanlu Xu, Liang Lin, Wei-Shi Zheng, Xiaobai Liu
Abstract: This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e.g. different poses/views, illumination) and high false positives in matching caused by conjunctions, occlusions or surrounding clutters. Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. This algorithm is designed within a candidacy graph whose vertices are matching candidates (i.e. a pair of source and target body parts), and iterates in two steps for convergence. (i) It generates possible partial matches based on compatible and competitive relations among body parts. (ii) It con- firms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods.
2 0.2018705 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
Author: Andy J. Ma, Pong C. Yuen, Jiawei Li
Abstract: This paper addresses a new person re-identification problem without the label information of persons under non-overlapping target cameras. Given the matched (positive) and unmatched (negative) image pairs from source domain cameras, as well as unmatched (negative) image pairs which can be easily generated from target domain cameras, we propose a Domain Transfer Ranked Support Vector Machines (DTRSVM) method for re-identification under target domain cameras. To overcome the problems introduced due to the absence of matched (positive) image pairs in target domain, we relax the discriminative constraint to a necessary condition only relying on the positive mean in target domain. By estimating the target positive mean using source and target domain data, a new discriminative model with high confidence in target positive mean and low confidence in target negative image pairs is developed. Since the necessary condition may not truly preserve the discriminability, multi-task support vector ranking is proposed to incorporate the training data from source domain with label information. Experimental results show that the proposed DTRSVM outperforms existing methods without using label information in target cameras. And the top 30 rank accuracy can be improved by the proposed method upto 9.40% on publicly available person re-identification datasets.
4 0.12485518 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
5 0.12231886 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
6 0.12142917 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
7 0.11940505 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes
8 0.10848127 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.10554127 238 iccv-2013-Learning Graphs to Match
10 0.097538657 143 iccv-2013-Estimating Human Pose with Flowing Puppets
11 0.094859794 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
12 0.09292721 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
13 0.090513304 444 iccv-2013-Viewing Real-World Faces in 3D
15 0.089833632 305 iccv-2013-POP: Person Re-identification Post-rank Optimisation
16 0.087026216 395 iccv-2013-Slice Sampling Particle Belief Propagation
17 0.086420454 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
18 0.085734047 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
19 0.085604429 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
20 0.084970273 313 iccv-2013-Person Re-identification by Salience Matching
topicId topicWeight
[(0, 0.221), (1, -0.005), (2, 0.012), (3, -0.003), (4, 0.076), (5, -0.06), (6, -0.035), (7, 0.07), (8, -0.039), (9, 0.049), (10, -0.015), (11, -0.008), (12, -0.014), (13, 0.002), (14, 0.027), (15, 0.062), (16, 0.06), (17, -0.002), (18, 0.079), (19, -0.015), (20, 0.086), (21, 0.012), (22, 0.042), (23, -0.023), (24, 0.12), (25, -0.048), (26, -0.067), (27, -0.004), (28, -0.054), (29, -0.033), (30, 0.031), (31, 0.021), (32, 0.121), (33, -0.015), (34, 0.102), (35, -0.011), (36, -0.04), (37, -0.024), (38, -0.015), (39, -0.033), (40, 0.041), (41, 0.062), (42, 0.043), (43, -0.002), (44, -0.025), (45, -0.008), (46, -0.078), (47, 0.074), (48, -0.037), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.96146971 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
Author: Yuanlu Xu, Liang Lin, Wei-Shi Zheng, Xiaobai Liu
Abstract: This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e.g. different poses/views, illumination) and high false positives in matching caused by conjunctions, occlusions or surrounding clutters. Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. This algorithm is designed within a candidacy graph whose vertices are matching candidates (i.e. a pair of source and target body parts), and iterates in two steps for convergence. (i) It generates possible partial matches based on compatible and competitive relations among body parts. (ii) It con- firms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods.
2 0.70484573 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
3 0.64956856 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang
Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.
4 0.63161266 118 iccv-2013-Discovering Object Functionality
Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.
5 0.62577027 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
6 0.62513661 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
7 0.61379367 313 iccv-2013-Person Re-identification by Salience Matching
8 0.61030561 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
9 0.6054855 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
10 0.60498518 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
11 0.60440159 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
12 0.60159421 131 iccv-2013-EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory
13 0.59481436 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
14 0.59470254 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
15 0.58872539 46 iccv-2013-Allocentric Pose Estimation
17 0.57304931 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
18 0.56683296 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking
19 0.56372529 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
20 0.55750775 87 iccv-2013-Conservation Tracking
topicId topicWeight
[(2, 0.071), (7, 0.017), (12, 0.03), (26, 0.07), (31, 0.045), (35, 0.022), (42, 0.112), (48, 0.012), (64, 0.063), (69, 0.207), (73, 0.05), (78, 0.014), (89, 0.203)]
simIndex simValue paperId paperTitle
same-paper 1 0.85492045 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
Author: Yuanlu Xu, Liang Lin, Wei-Shi Zheng, Xiaobai Liu
Abstract: This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e.g. different poses/views, illumination) and high false positives in matching caused by conjunctions, occlusions or surrounding clutters. Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. This algorithm is designed within a candidacy graph whose vertices are matching candidates (i.e. a pair of source and target body parts), and iterates in two steps for convergence. (i) It generates possible partial matches based on compatible and competitive relations among body parts. (ii) It con- firms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods.
2 0.82767081 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
3 0.79844415 52 iccv-2013-Attribute Adaptation for Personalized Image Search
Author: Adriana Kovashka, Kristen Grauman
Abstract: Current methods learn monolithic attribute predictors, with the assumption that a single model is sufficient to reflect human understanding of a visual attribute. However, in reality, humans vary in how they perceive the association between a named property and image content. For example, two people may have slightly different internal models for what makes a shoe look “formal”, or they may disagree on which of two scenes looks “more cluttered”. Rather than discount these differences as noise, we propose to learn user-specific attribute models. We adapt a generic model trained with annotations from multiple users, tailoring it to satisfy user-specific labels. Furthermore, we propose novel techniques to infer user-specific labels based on transitivity and contradictions in the user’s search history. We demonstrate that adapted attributes improve accuracy over both existing monolithic models as well as models that learn from scratch with user-specific data alone. In addition, we show how adapted attributes are useful to personalize image search, whether with binary or relative attributes.
4 0.78211945 338 iccv-2013-Randomized Ensemble Tracking
Author: Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, Camille Monnier
Abstract: We propose a randomized ensemble algorithm to model the time-varying appearance of an object for visual tracking. In contrast with previous online methods for updating classifier ensembles in tracking-by-detection, the weight vector that combines weak classifiers is treated as a random variable and the posterior distribution for the weight vector is estimated in a Bayesian manner. In essence, the weight vector is treated as a distribution that reflects the confidence among the weak classifiers used to construct and adapt the classifier ensemble. The resulting formulation models the time-varying discriminative ability among weak classifiers so that the ensembled strong classifier can adapt to the varying appearance, backgrounds, and occlusions. The formulation is tested in a tracking-by-detection implementation. Experiments on 28 challenging benchmark videos demonstrate that the proposed method can achieve results comparable to and often better than those of stateof-the-art approaches.
5 0.77987683 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
6 0.77941185 379 iccv-2013-Semantic Segmentation without Annotating Segments
7 0.77789444 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
8 0.77785122 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
9 0.77696478 150 iccv-2013-Exemplar Cut
10 0.77649581 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
11 0.77619308 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
12 0.77571154 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
13 0.7751807 57 iccv-2013-BOLD Features to Detect Texture-less Objects
14 0.7748754 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
15 0.77441406 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
16 0.77428377 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
17 0.77409858 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
18 0.77408928 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
19 0.77349508 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
20 0.77298152 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection