iccv iccv2013 iccv2013-267 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Cheng Li, Kris M. Kitani
Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
Reference: text
sentIndex sentText sentNum sentScore
1 To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. [sent-4, score-0.256]
2 We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. [sent-5, score-0.677]
3 As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. [sent-6, score-1.46]
4 This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. [sent-7, score-0.78]
5 To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. [sent-8, score-0.422]
6 The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. [sent-9, score-0.762]
7 In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. [sent-10, score-0.702]
8 Virtual probe features are extracted at test time to recommend the best detector performance. [sent-25, score-0.861]
9 Therefore, we aim to extend the state-of-the-art in egocentric hand detection to provide a more stable pixel-resolution detection of hand regions. [sent-27, score-0.428]
10 In particular, we will show that the problem of pixelwise hand detection can be effectively solved by posing the problem as a model recommendation task. [sent-28, score-0.677]
11 The role of our proposed recommendation system is to suggest the n-best hand detectors based on information extracted from the test image. [sent-29, score-0.734]
12 In a typical recommendation task, information from the test distribution is acquired through a small amount of labeled data from the test distribution called the probe set. [sent-30, score-1.349]
13 In the original context of recommendation systems such as movie recommendation, that probe set can be easily obtained by allowing a specific user to rank a small set of movies, safely assuming that the preferences of the user will not change drastically over time. [sent-31, score-1.278]
14 In the case of egocentric hand detection, the probe set would amount to a small number of labeled pixels provided by the user. [sent-32, score-0.961]
15 Based on this information, the recommendation system could return a set of scene appropriate detectors. [sent-33, score-0.557]
16 , appearance of the hands, imaging conditions) is constantly undergoing change, rendering the initial probe set invalid. [sent-36, score-0.751]
17 It would be impractical to update 2624 the probe set dynamically, since this would require the user to label new pixels very time he moves. [sent-37, score-0.669]
18 A major difference between our egocentric hand detection scenario and movie recommendation is that we have access to a large amount of secondary information about the test subject (i. [sent-38, score-1.009]
19 While we do not have direct information about hand regions, such information about the brightness of the scene, objects in the scene and the structure of the scene can give us clues about the imaging conditions and help us infer what the hands might look like. [sent-41, score-0.272]
20 Our claim is that this secondary source of information can be used to generate a virtual probe set to recommend the best detector. [sent-42, score-0.996]
21 Based on this observation, we propose to frame hand region detection for egocentric videos as a model recommendation task, where a dynamic virtual probe set is used to recommend a set of detectors for a dynamically changing test distribution. [sent-43, score-1.932]
22 The contributions of this work are: (1) a novel dynamic classifier selection methodology applied to first-person hand detection and (2) a recommendation system framework that does not require a labeled probe set. [sent-44, score-1.334]
23 In particular, we show that virtual probe features, namely global appearance and detector correlation, can be use to recommend the best detectors for test-time performance. [sent-45, score-1.103]
24 Moreover, we show the effectiveness of our approach by showing improved performance on cross-user experiments for egocentric hand detection. [sent-46, score-0.281]
25 Previous Work Previously the extraction of hands for egocentric vision has been posed as a figure-ground segmentation problem using motion cues [15, 5, 13]. [sent-48, score-0.35]
26 One of the major advantages of motion-based hand detection approaches is that they are robust to a wide range of illumination and imaging conditions. [sent-49, score-0.183]
27 Traditional approaches to hand detection based on skin color [7] require that the statistics of the appearance are known in advance but have the benefit of being agnostic to motion. [sent-54, score-0.229]
28 However, a problem arises when the distribution of hand skin color changes over time because a single skin color classifier cannot account for these changes. [sent-55, score-0.244]
29 In the case of an egocentric camera, the camera is mobile and unconstrained (i. [sent-57, score-0.202]
30 the user can walk indoors or outdoors), so it is important that the hands can be detected under a wide range of imaging conditions and also be robust to extreme motion. [sent-59, score-0.23]
31 In a recent work, Li and Kitani [9] have shown that hands can be detected at the pixel-level for egocentric videos under different imaging conditions using only appearance. [sent-60, score-0.414]
32 In their framework, a global color histogram was used as a proxy feature to find a hand region detector trained under similar imaging conditions. [sent-61, score-0.337]
33 However, since a color histogram folds both the appearance and illumination conditions onto a single feature space, it has difficulty generalizing to new scenes with similar imaging conditions but with different appearance (e. [sent-62, score-0.213]
34 [10] has shown that the recommendation system paradigm can be very effective for automated visual cognition tasks such as action recognition, when only a small amount of training data is available. [sent-66, score-0.617]
35 As we have described above this is not the case for egocentric hand detection where the test distribution is undergoing constant change. [sent-68, score-0.404]
36 We present a probe-free recommendation approach over a dynamically changing test distribution. [sent-69, score-0.605]
37 A recommendation system approach differs from a standard supervised detection paradigm in that the detector is given the ability to adaptively change its parameters based on features extracted from the test distribution. [sent-70, score-0.796]
38 Preliminaries Under our recommendation system paradigm, it is necessary to define the (1) set of models, (2) set of tasks, (3) a score (or ratings) matrix, (4) a set of probe models and (5) the recommender system. [sent-74, score-1.269]
39 In our scenario, a model is a random forest regressor that predicts a value between 0 and 1, where the regressor has been trained on various subsets of an egocentric hand dataset using a specific set of image features (e. [sent-78, score-0.466]
40 The score matrix R ∈ RM×N consists of the score rmn =e sfcmor r(ex nm)a otrfi xth Re m ∈-th R model evaluated on the data of the n-th task . [sent-84, score-0.196]
41 The set of probe models is a small number of models, which are used to evaluate a small group of labeled data from the test distribution (this small group of labeled data is sometimes called the ‘training data’ but we will call it the probe data to avoid confusion). [sent-87, score-1.416]
42 The set of probe models fp(x) is typically a subset of the collection of models. [sent-88, score-0.651]
43 Later we will introduce a disjoint set of models called the virtual probe features as a proxy to this set of probe models. [sent-89, score-1.668]
44 The role of a recommendation system is to use the response of the probe models on the probe data in order to recommend the best model for evaluating the test set. [sent-90, score-2.019]
45 The recommendation system defines a mapping from probe responses to a model. [sent-91, score-1.235]
46 In the following, we explain our use of virtual proxy features which can be used in the place of a probe set, thereby allowing the model to retain the predictive capabilities of a recommendation system without the restriction of a labeled probe data set. [sent-94, score-2.238]
47 Virtual Probe Features Since we do not have access to labeled probe data, we would like to identify a set of proxy models or features (x)}vV=1 to help define a mapping from the test image t{o a xlis)t} of high-performance detectors. [sent-97, score-0.887]
48 We call this set of proxy features as virtual probe features. [sent-98, score-1.017]
49 We propose two types of virtual probe features: (1) global appearance features (extending the work of [9]) and (2) detector crosscorrelation features. [sent-99, score-1.054]
50 Global appearance features such as a HSV histograms can be used as a proxy to the imaging conditions. [sent-100, score-0.199]
51 A full list of appearance-based virtual probe features are given in section 6. [sent-102, score-0.929]
52 , a mean detector) and a secondary detector on the test image to produce two response maps. [sent-108, score-0.191]
53 A similar representation was used in [10] for the internal representation of the score matrix but we are using it here as the virtual probe feature. [sent-111, score-0.99]
54 Typically the recommendation system uses this score matrix to suggest a set of detector based on the response of the probe models. [sent-116, score-1.392]
55 However, since we do not have access to a probe set and therefore cannot evaluate the probe models, we will use a set of virtual probe features as a proxy to the probe 2626 Models Virtual Probes tion of models and virtual probe features on the training images. [sent-117, score-3.897]
56 This requires that we also store the response of the virtual probe features as part of the score matrix. [sent-119, score-1.037]
57 To incorporate the virtual proxy features, we augment the score matrix with virtual probe feature responses ˆr vn on × Rˆ the training data with the feature matrix ∈ RV ×N, where V is the number of virtual probes. [sent-123, score-1.686]
58 Concatenating the score matrix with the features matrix, we obtain an augmented score matrix ∈ R(M+V )×N. [sent-124, score-0.265]
59 A visualization of the transpose of the augmented score matrix is given in Figure 3, where each row is indexed by training images n and the columns are indexed by models and virtual probe features. [sent-125, score-1.13]
60 Recommendation System We would like our recommendation system to tell us the best performing hand detector given an arbitrary test image. [sent-128, score-0.747]
61 In our scenario our recommendation system defines a mapping h(ˆ r) → r, from a set of probe feature values ˆra = extracted from a test image x fetesatt to the estimated scores of the all models r = f(xtest) on the test image. [sent-129, score-1.383]
62 Following [10], we describe several strategies we evaluate for learning the recommendation (mapping) function h(ˆ r). [sent-130, score-0.552]
63 We use non-negative matrix factorization [8] to decompose the augmented score matrix, = where U is a non-negative (M + V ) K matrix and a non-negative nKo ×-n eNg mtivaetri (xM. [sent-134, score-0.221]
64 W˜, W˜ augmented score matrix can be separated into the V virtual probe responses and M model responses. [sent-138, score-1.065]
65 At test time, the virtual probe features of the test image can be used to solve for the weight vector θ of the sub-matrix Uˆ to satisfy r Uˆ? [sent-139, score-1.025]
66 2 Sparse Coding A sparsity prior can also be enforced on the matrix Rˆ via a sparse weight vector α, which is used to select a sparse set of virtual probe features to span the imaging conditions. [sent-145, score-1.012]
67 1, (2) where are the responses of the virtual probe features on the test image, Rˆ are the rows of the augmented score matrix corresponding to the virtual probe features, and α is the vector of weights for the sparse reconstruction. [sent-150, score-2.042]
68 3 Nearest Neighbor Another simple way to map a set of virtual probe features rˆ to model scores r, is to treat the virtual probe features as a direct index into the augmented score matrix. [sent-155, score-1.984]
69 At test time, we extract the virtual probe features and then find the training image with the most similar virtual probe feature response distribution using a nearest neighbor search. [sent-156, score-1.96]
70 4 Non-linear Regression Since our augmented score matrix is dense (no missing data) we can take a step further and attempt to learn a nonlinear mapping between virtual probe features ˆr and model scores r with a non-linear regressor g(ˆ r) → r. [sent-161, score-1.163]
71 As this dataset was created for hands under varying illumination, the hands of one person is recorded under various imaging conditions but does not contain a wide range of actions. [sent-193, score-0.328]
72 We also used the Georgia Tech egocentric activities (GTEA) dataset [5] to test our segmentation algorithm. [sent-196, score-0.319]
73 Evaluating Probe Features × In this experiment we are interested in the ability of virtual probe features (global appearance features and detector cross-correlation features) to improve the performance of hand detection. [sent-200, score-1.131]
74 We tested 20 different variations of virtual probe combinations over the CMU EDSH dataset and the UCI ADL dataset. [sent-201, score-0.915]
75 This baseline represents a model without any concept of model recommendation and therefore has no virtual probe features. [sent-208, score-1.428]
76 First, we evaluated HSV color histograms and global HOG [3] over a variety of spatial bins as a virtual probe feature. [sent-210, score-0.937]
77 We can see from the distribution of scores in bold, that the HSV-based virtual probes obtain the best performance for the majority of datasets. [sent-213, score-0.393]
78 Comparing Recommendation Strategies We now compare the four recommendation strategies explained in section 4. [sent-234, score-0.552]
79 For each recommendation experiment, we use the same parameters as the previous experiment but using the best combination of virtual probe features (i. [sent-236, score-1.46]
80 Table 2 shows that our recommendation approach beats the state-of-the-art detection of [9]. [sent-239, score-0.565]
81 Non-linear models have the benefit of capturing more complex mappings between the probe features and the unobserved features. [sent-241, score-0.683]
82 First, a large number of virtual features increases the possibility of overfitting to the data in the score matrix. [sent-243, score-0.358]
83 Second, in the case of the RF model, the mapping from virtual probes to model scores is expensive, since a single RF model is trained for each entry of the score matrix. [sent-244, score-0.451]
84 Minimizing Correlation Feature Usage In the previous experiments, many cross-correlation features were used as virtual probe features. [sent-249, score-0.929]
85 Also as mentioned previously, a large number of probe features can also cause the non-linear recommendation schemes to over-fit to the data. [sent-254, score-1.195]
86 In this section, we examine the tradeoff between computation time and performance, by varying the number of virtual cross-correlation probe features. [sent-255, score-0.897]
87 We plot the change in performance on the EDSH dataset by increasing the number of cross-correlation probe features. [sent-256, score-0.666]
88 The number of global appearance probe features (combination of HSV and HOG features) remains constant throughout. [sent-257, score-0.711]
89 When the number of probes is 0, only the global appearance features are being used. [sent-258, score-0.168]
90 Figure 6 shows the results for the top performing non-linear recommendation strategies using the random forest (RF) and knearest neighbors (KNN). [sent-259, score-0.571]
91 Although we expected the RF recommendation approach to overfit to the data, we observed that the RF is relatively stable. [sent-261, score-0.531]
92 Leaveone-out style training where probe includes global appearance and detector cross-correlation features. [sent-279, score-0.776]
93 We use the same no probe single detector baseline to show how our recommendation approach can be used to adapt to new users in various lighting conditions. [sent-299, score-1.25]
94 This shows the challenging nature of detecting hands in real life scenarios especially in very dim lit scenes where it is hard to detect skin texture. [sent-302, score-0.188]
95 Conclusion In this work it was our aim to extend the state-of-theart in egocentric hand detection to provide a more stable pixel-resolution detection of hand regions. [sent-306, score-0.428]
96 In particular, we showed that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. [sent-307, score-0.697]
97 Through quantitative analysis we showed that our proposed approach is able to retrieve the best hand detectors based on global appearance features and cross-correlation feature extracted from the test image. [sent-308, score-0.276]
98 In our experiments we showed robust hand detection by testing our model across multiple users and showed that our proposed approach attains stateof-the-art performance. [sent-310, score-0.177]
99 Acknowledgements We thank Pyry Matikainen for discussions regarding model recommendation and the initial inspiration for using detector cross-correlation. [sent-311, score-0.594]
100 Figure-ground segmentation improves handled object recognition in egocentric video. [sent-382, score-0.233]
wordName wordTfidf (topN-words)
[('probe', 0.632), ('recommendation', 0.531), ('virtual', 0.265), ('egocentric', 0.202), ('edsh', 0.172), ('hands', 0.117), ('adl', 0.113), ('probes', 0.089), ('proxy', 0.088), ('hsv', 0.082), ('hand', 0.079), ('rf', 0.069), ('slash', 0.069), ('recommend', 0.066), ('detector', 0.063), ('gtea', 0.061), ('score', 0.061), ('regressor', 0.058), ('uci', 0.057), ('skin', 0.051), ('imaging', 0.051), ('test', 0.048), ('augmented', 0.047), ('response', 0.047), ('rmn', 0.042), ('matikainen', 0.04), ('cmu', 0.039), ('user', 0.037), ('chest', 0.034), ('factorization', 0.034), ('detection', 0.034), ('secondary', 0.033), ('posing', 0.033), ('matrix', 0.032), ('features', 0.032), ('labeled', 0.032), ('segmentation', 0.031), ('detectors', 0.03), ('indexed', 0.028), ('skills', 0.028), ('kitani', 0.028), ('appearance', 0.028), ('responses', 0.028), ('potentials', 0.027), ('dynamically', 0.026), ('temporal', 0.026), ('paradigm', 0.026), ('system', 0.026), ('movie', 0.025), ('nmf', 0.025), ('conditions', 0.025), ('users', 0.024), ('motor', 0.023), ('tsinghua', 0.023), ('scenario', 0.023), ('believe', 0.022), ('distribution', 0.021), ('gestures', 0.021), ('strategies', 0.021), ('color', 0.021), ('activities', 0.02), ('symbol', 0.02), ('constantly', 0.02), ('lit', 0.02), ('showed', 0.02), ('extracted', 0.02), ('transductive', 0.02), ('undergoing', 0.02), ('illumination', 0.019), ('living', 0.019), ('models', 0.019), ('global', 0.019), ('hog', 0.019), ('forest', 0.019), ('videos', 0.019), ('daily', 0.018), ('scores', 0.018), ('evaluating', 0.018), ('training', 0.018), ('mapping', 0.018), ('access', 0.018), ('dataset', 0.018), ('indexes', 0.016), ('knn', 0.016), ('likelihood', 0.016), ('amount', 0.016), ('advance', 0.016), ('histogram', 0.016), ('style', 0.016), ('change', 0.016), ('potential', 0.016), ('spat', 0.015), ('ava', 0.015), ('overfits', 0.015), ('ani', 0.015), ('mka', 0.015), ('mtivaetri', 0.015), ('crosscorrelation', 0.015), ('nko', 0.015), ('takeuchi', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection
Author: Cheng Li, Kris M. Kitani
Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
2 0.31111553 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
Author: Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
Abstract: Over the past two decades, a number of face recognition methods have been proposed in the literature. Most of them use holistic face images to recognize people. However, human faces are easily occluded by other objects in many real-world scenarios and we have to recognize the person of interest from his/her partial faces. In this paper, we propose a new partial face recognition approach by using feature set matching, which is able to align partial face patches to holistic gallery faces automatically and is robust to occlusions and illumination changes. Given each gallery image and probe face patch, we first detect keypoints and extract their local features. Then, we propose a Metric Learned ExtendedRobust PointMatching (MLERPM) method to discriminatively match local feature sets of a pair of gallery and probe samples. Lastly, the similarity of two faces is converted as the distance between two feature sets. Experimental results on three public face databases are presented to show the effectiveness of the proposed approach.
3 0.20150399 305 iccv-2013-POP: Person Re-identification Post-rank Optimisation
Author: Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang
Abstract: Owing to visual ambiguities and disparities, person reidentification methods inevitably produce suboptimal ranklist, which still requires exhaustive human eyeballing to identify the correct target from hundreds of different likelycandidates. Existing re-identification studies focus on improving the ranking performance, but rarely look into the critical problem of optimising the time-consuming and error-prone post-rank visual search at the user end. In this study, we present a novel one-shot Post-rank OPtimisation (POP) method, which allows a user to quickly refine their search by either “one-shot” or a couple of sparse negative selections during a re-identification process. We conduct systematic behavioural studies to understand user’s searching behaviour and show that the proposed method allows correct re-identification to converge 2.6 times faster than the conventional exhaustive search. Importantly, through extensive evaluations we demonstrate that the method is capable of achieving significant improvement over the stateof-the-art distance metric learning based ranking models, even with just “one shot” feedback optimisation, by as much as over 30% performance improvement for rank 1reidentification on the VIPeR and i-LIDS datasets.
4 0.1060475 247 iccv-2013-Learning to Predict Gaze in Egocentric Video
Author: Yin Li, Alireza Fathi, James M. Rehg
Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.
5 0.092975356 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang
Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.
6 0.081176192 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
7 0.076842293 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
8 0.076172911 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
9 0.075622581 106 iccv-2013-Deep Learning Identity-Preserving Face Space
10 0.060382497 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
11 0.054132458 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
12 0.04940505 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
13 0.048805386 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
14 0.047345206 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images
15 0.04530213 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
16 0.045162536 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
17 0.043324213 82 iccv-2013-Compensating for Motion during Direct-Global Separation
18 0.043115638 282 iccv-2013-Multi-view Object Segmentation in Space and Time
19 0.042841408 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
20 0.042815063 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
topicId topicWeight
[(0, 0.129), (1, 0.009), (2, -0.007), (3, -0.021), (4, 0.023), (5, -0.034), (6, 0.055), (7, 0.048), (8, 0.003), (9, 0.001), (10, 0.013), (11, 0.018), (12, 0.005), (13, -0.02), (14, -0.026), (15, 0.025), (16, -0.07), (17, 0.002), (18, 0.022), (19, 0.016), (20, -0.018), (21, -0.095), (22, -0.039), (23, -0.086), (24, 0.061), (25, 0.099), (26, -0.107), (27, 0.147), (28, -0.012), (29, -0.101), (30, 0.042), (31, -0.076), (32, -0.022), (33, 0.047), (34, 0.047), (35, 0.026), (36, -0.071), (37, -0.002), (38, -0.023), (39, 0.042), (40, 0.007), (41, 0.028), (42, 0.032), (43, -0.041), (44, -0.064), (45, 0.046), (46, -0.023), (47, -0.012), (48, 0.078), (49, -0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.83030772 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection
Author: Cheng Li, Kris M. Kitani
Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
2 0.75830162 305 iccv-2013-POP: Person Re-identification Post-rank Optimisation
Author: Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang
Abstract: Owing to visual ambiguities and disparities, person reidentification methods inevitably produce suboptimal ranklist, which still requires exhaustive human eyeballing to identify the correct target from hundreds of different likelycandidates. Existing re-identification studies focus on improving the ranking performance, but rarely look into the critical problem of optimising the time-consuming and error-prone post-rank visual search at the user end. In this study, we present a novel one-shot Post-rank OPtimisation (POP) method, which allows a user to quickly refine their search by either “one-shot” or a couple of sparse negative selections during a re-identification process. We conduct systematic behavioural studies to understand user’s searching behaviour and show that the proposed method allows correct re-identification to converge 2.6 times faster than the conventional exhaustive search. Importantly, through extensive evaluations we demonstrate that the method is capable of achieving significant improvement over the stateof-the-art distance metric learning based ranking models, even with just “one shot” feedback optimisation, by as much as over 30% performance improvement for rank 1reidentification on the VIPeR and i-LIDS datasets.
3 0.69673204 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
Author: Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
Abstract: Over the past two decades, a number of face recognition methods have been proposed in the literature. Most of them use holistic face images to recognize people. However, human faces are easily occluded by other objects in many real-world scenarios and we have to recognize the person of interest from his/her partial faces. In this paper, we propose a new partial face recognition approach by using feature set matching, which is able to align partial face patches to holistic gallery faces automatically and is robust to occlusions and illumination changes. Given each gallery image and probe face patch, we first detect keypoints and extract their local features. Then, we propose a Metric Learned ExtendedRobust PointMatching (MLERPM) method to discriminatively match local feature sets of a pair of gallery and probe samples. Lastly, the similarity of two faces is converted as the distance between two feature sets. Experimental results on three public face databases are presented to show the effectiveness of the proposed approach.
4 0.64567733 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
Author: Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen
Abstract: The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, lowface resolutions, varying headpose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of ‘best quality ’ from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks–quality alignment, geometric alignment and face recognition–can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive , experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
5 0.56403482 261 iccv-2013-Markov Network-Based Unified Classifier for Face Identification
Author: Wonjun Hwang, Kyungshik Roh, Junmo Kim
Abstract: We propose a novel unifying framework using a Markov network to learn the relationship between multiple classifiers in face recognition. We assume that we have several complementary classifiers and assign observation nodes to the features of a query image and hidden nodes to the features of gallery images. We connect each hidden node to its corresponding observation node and to the hidden nodes of other neighboring classifiers. For each observation-hidden node pair, we collect a set of gallery candidates that are most similar to the observation instance, and the relationship between the hidden nodes is captured in terms of the similarity matrix between the collected gallery images. Posterior probabilities in the hidden nodes are computed by the belief-propagation algorithm. The novelty of the proposed framework is the method that takes into account the classifier dependency using the results of each neighboring classifier. We present extensive results on two different evaluation protocols, known and unknown image variation tests, using three different databases, which shows that the proposed framework always leads to good accuracy in face recognition.
6 0.49520129 313 iccv-2013-Person Re-identification by Salience Matching
7 0.48343918 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
8 0.43056405 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
9 0.42813119 106 iccv-2013-Deep Learning Identity-Preserving Face Space
10 0.41564542 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
11 0.41403875 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
12 0.39031711 154 iccv-2013-Face Recognition via Archetype Hull Ranking
13 0.38261065 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
14 0.37314785 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
15 0.36578733 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
16 0.36119023 213 iccv-2013-Implied Feedback: Learning Nuances of User Behavior in Image Search
17 0.3580752 422 iccv-2013-Toward Guaranteed Illumination Models for Non-convex Objects
18 0.35454431 157 iccv-2013-Fast Face Detector Training Using Tailored Views
19 0.35028827 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
20 0.34567854 180 iccv-2013-From Where and How to What We See
topicId topicWeight
[(2, 0.066), (7, 0.012), (12, 0.012), (26, 0.103), (31, 0.033), (38, 0.25), (40, 0.012), (42, 0.086), (64, 0.055), (73, 0.024), (78, 0.012), (84, 0.01), (89, 0.163), (95, 0.011), (98, 0.034)]
simIndex simValue paperId paperTitle
1 0.81380618 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings
Author: Linjie Yang, Jianzhuang Liu, Xiaoou Tang
Abstract: An important topic in computer vision is 3D object reconstruction from line drawings. Previous algorithms either deal with simple general objects or are limited to only manifolds (a subset of solids). In this paper, we propose a novel approach to 3D reconstruction of complex general objects, including manifolds, non-manifold solids, and non-solids. Through developing some 3D object properties, we use the degree of freedom of objects to decompose a complex line drawing into multiple simpler line drawings that represent meaningful building blocks of a complex object. After 3D objects are reconstructed from the decomposed line drawings, they are merged to form a complex object from their touching faces, edges, and vertices. Our experiments show a number of reconstruction examples from both complex line drawings and images with line drawings superimposed. Comparisons are also given to indicate that our algorithm can deal with much more complex line drawings of general objects than previous algorithms. 1. Introduction and Related Work A 2D line drawing is the most straightforward way of illustrating a 3D object. Given a line drawing representing a 3D object, our visual system can understand the 3D structure easily. For example, we can interpret without difficulty the line drawing shown in Fig. 1(a) as a castle with four walls and one door. Imitating this ability has been a longstanding and challenging topic in computer vision when a line drawing is as complex as this example. The applications of this work include 3D object design in CAD and for 3D printers, 3D query generation for 3D object retrieval, and 3D modeling from images. In this paper, same as the majority of related work, a line drawing is defined as the orthogonal projection of the Fig.1 (a)Alinedrawing(rae)pres nti gac stle.(b)The3Dm(obd)el of the line drawing. edges and vertices of a 3D object in a generic view, and objects with planar surfaces are considered. A line drawing is represented by an edge-vertex graph. It can be obtained by the user/designer who draws on the screen with a tablet pen, a mouse, or a finger (on a touch sensitive screen), with all, with some, or without hidden edges and vertices. Line labeling is the earliest work to interpret line drawings [1], [17]. It searches for a set of consistent labels such as convex, concave, and occluding from a line drawing to test its correctness and/or realizability. Line labeling itself cannot recover 3D shape from a line drawing. Later, 3D reconstruction from the contours (line drawings) of objects in images is studied [19], [14], [13], which handles simple objects only. Model-based 3D reconstruction [2], [3], [20] can deal with more complex objects, but these methods require to pre-define a set of parametric models. Recently, popular methods of 3D reconstruction from line drawings are optimization based, which are most related to our work and are reviewed next. These methods can be classified into two categories: one dealing with manifolds and the other dealing with general objects. A general object can be a manifold, non-manifold solid, or non-solid. Manifolds are a subset of solids, defined as follows: A manifold, or more rigorously 2D manifold, is a solid where every point on its surface has a neighborhood topologically equivalent to an open diskin the 2D Euclidean space. 1433 In this paper, a solid is a portion of 3D space bounded by planar faces, and a manifold is also bounded by planar faces. Each edge of such manifolds is shared exactly by two faces [4]. Most 3D reconstruction methods from a line drawing assume that the face topology of the line drawing is known in advance. This information can reduce the reconstruction complexity greatly. Algorithms have been developed to find faces from a line drawing in [16], [10], and [9], where [16] and [10] are for general objects and [9] for manifolds. Optimization-based 3D reconstruction depends on some critera (also called image regularities) that simulate our visual perception. Marill proposes a very simple but effective criterion to reconstruct a simple object: minimizing the standard deviation of the angles (MSDA) in the object [11]. Later, other regularities are proposed to deal with more complex objects such as face planarity, line parallelism, isometry, and corner orthogonality [5], [6], [15], [18]. In these methods, an objective function ?C Φ(z1,z2, ...,zNv) = ?ωiφi(z1,z2, ...,zNv) (1) i?= ?1 is minimized to derive the depths z1, z2 , ..., zNv , where Nv is the number of vertices in the line drawing, φi , i = 1, 2, ..., C, are the regularities, and ωi , i = 1, 2, ..., C, are the weights. The main problem in this approach is that these algorithms are easy to get trapped into local minima (obtaining failed results) when a line drawing is complex with many vertices, due to the search in a highdimensional space (Nv dimensions) with the non-convex objective function. For example, the search space is of 56 dimensions for the object in Fig. 1(a). To alleviate this problem, Liu et al. formulate 3D reconstruction in a lower dimensional space so that the optimization procedure has a better chance to find desired results [7]. For the complex object in Fig. 1(a), however, the search in a space with 18 dimensions is still too difficult for it to obtain a satisfactory 3D object (see Section 3). The methods in [5], [6], [15], [18], and [7] reconstruct general objects, and the one in [7] can deal with more complex objects than the other four. But these algorithms cannot avoid the local minimum problem in a high dimensional search space when a line drawing is complex. In [8], a divide-and-conquer (D&C;) strategy is used to tackle this problem. It first separates a complex line drawing into multiple simpler ones, then independently recovers the 3D objects from these line drawings, and finally merges them to form a complete object. Since the separated line drawings are much simpler than the original one, the 3D reconstruction from each of them is an easy task. This D&C; approach handles manifolds only. Based on known faces found by the face identification algorithm in [9], it uses manifold properties to find internal faces Fig.2(a)Asimaeplhdmiafnbo(ldagc)withnae'fahdc'eisfba'nd(obgcn)'eitrnalfce (a, b, c, d). (b) Decomposition result from the internal face. (a) (b) (c) (d) Fig. 3. (a) A non-manifold solid. (b) Expected decomposition of (a). (c) A sheet object. (d) Expected decomposition of (c). from a line drawing and then separates the line drawing from the internal faces. An internal face is defined as an imaginary face lying inside a manifold with only its edges visible on the surface [8]. It is not a real face but can be considered as two coincident real faces of identical shape belonging to two manifolds. For example, Fig. 2(a) shows a manifold with nine faces. The D&C; first finds the internal face (a, b, c, d) and then decomposes the line drawing from this internal face (Fig. 2(b)). However, handling manifolds only limits the applica- tions of [8]. In many applications in computer vision and graphics such as 3D object matching, retrieval, and rendering, it is unnecessary to represent objects as manifolds, in order to facilitate data processing and reduce data storage. For example, a flat ground can be represented by a sheet (one face), but if it is represented by a manifold, a thin box with six faces has to be used. Fig. 1(a), Fig. 3(a), and Fig. 3(c) are line drawings of three non-manifolds. In this paper, we propose a novel approach to 3D reconstruction of complex general objects based on visual perception, object properties, and new line drawing decomposition. Compared with previous methods, ours can deal with much more complex line drawings of general objects. It can handle not only manifolds but also non-manifold solids and non-solids, and is insensitive to sketching errors. 2. General Object Reconstruction We also use the D&C; strategy to deal with 3D reconstruction from a line drawing representing a complex general object. The key is how to decompose a complex line drawing of any objects into multiple simpler line drawings. These decomposed line drawings should represent objects that are in accordance with our visual perception, which makes the 3D reconstruction from these line drawings easier and better because the regularities used to build an objective function for reconstruction follow human perception of 1434 common objects [11], [5], [6], [15], [18]. Before the decomposition of a line drawing, we assume that all the real and internal faces of the object have been obtained from the line drawing using a face identification algorithm. For example, the algorithm in [10] finds 10 faces from the line drawing in Fig. 2(a) (including the internal face), and obtains 12 and 7 faces from the line drawings in Figs. 3(a) and (c), respectively. 2.1. Decomposing line drawings of solids In this subsection, we consider the line drawings of solids first. The decomposition method will be extended to the line drawings of general objects in the next subsection. It is not difficult to see that in general, a complex object, especially a manmade complex object, can be considered as the combinations of multiple smaller objects. The most common combination is the touch of two faces from two different objects such as the one in Fig. 2. Other combinations are the touches among lines, faces, and vertices. Our target is to decompose a complex solid into multiple primitive solids. Before the definition of a primitive solid, we introduce a term called the degree of freedom of a solid. Definition 1. The degree of freedom (DoF) of a 3D solid represented by a line drawing is the minimal number of zcoordinates that can uniquely determine this 3D solid. This is the first time that the concept of DoF is used to decompose line drawings. Now let us consider a simple object in Fig. 4(a). The cube has six faces: (v1, v2 , v3 , v4), (v1, v2, v6, v5), (v1, v4, v8, v5), (v2 , v3, v7, v6), (v4, v3, v7, v8), and (v6, v7, v8, v5). We can show that the cube is determined if the z-coordinates of its four non-coplanar vertices are known. Without loss of generality, suppose z1, z2, z4, and z5 are known. Since the 3D coordinates of v1, v2, and v4 are fixed (remind that the x- and y-coordinates of all the vertices are known under the orthogonal projection), the 3D plane passing through the face (v1, v2 , v3, v4) is determined, and thus z3 can be calculated. Similarly, z6 and z8 can be obtained. Finally, z7 can be computed with the 3D coordinates of v3, v4, and v8 known, which determine the plane passing through the face (v4, v3, v7, v8). So the 3D cube can be determined by the known four z-coordinates, z1, z2, z4, and z5. Further, it can be verified that three 3D vertices cannot determine this object uniquely because they can only define one face in 3D space. Therefore, the DoF of the cube is 4. Similar analysis allows us to know that the solids in Fig. 2(b), Fig. 3(b), and Fig. 4(b) all have DoF 4, while the two solids in Fig. 2(a) and Fig. 3(a) have DoFs 5 and 6, respectively. From these analysis, we can have the intuition that solids with DoF 4 serve as the building blocks of more complex solids whose DoFs are more than 4. Besides, we have the following property: Property 1. There is no solid with DoF less than 4. Fv5(1iga).4v (26a)Av48cubev3w7hos(be)DoFis4.(b)Anedo(tch)fergBasbolifdAwhosfCeDocFji is also 4. (c) Part of a line drawing with each vertex of degree 3. This property is easy to verify. A solid with fewest faces is a tetrahedron. Every two of its four faces are not co-planar. Three 3D vertices of a tetrahedron can only determine one 3D face. Next, we define primitive solids. Definition 2. A 3D solid represented by a line drawing is called a primitive solid if its DoF is 4. Property 2. If every vertex of a 3D solid represented by a line drawing has degree 31, then it is a primitive solid. Proof. Let part of such a line drawing be the one as shown in Fig. 4(c). At each vertex, every two of the three edges form a face, because a solid is bounded by faces without dangling faces and edges. Let the three paths fA, fB, and fC in Fig. 4(c) denote the three faces at vertex a. Without loss of generality, suppose that the four zcoordinates (and thus the four 3D coordinates) of vertices a, b, c, and d are known. Then the three planes passing through fA, fB, and fC are determined in 3D space. With the two known 3D planes passing through fA and fB at vertex b, the 3D coordinates of vertices g and h connected to b can be computed. Similarly, the 3D coordinates of vertices e and f connected to d and the 3D coordinates of vertices i and j connected to c can be obtained. Furthermore, all the 3D coordinates of the other vertices connected to e, f, g, h, i, and j can be derived in the same way. This derivation can propagate to all the vertices of this solid. Therefore, the DoF of this solid is 4 and it is a primitive solid. Property 3. The DoF of a solid is 5 which is obtained by gluing two faces of two primitive solids. Proof. Let the two primitive solids be PS1 and PS2 and their corresponding gluing faces be f1and f2, respectively. The DoFs of PS1 and PS2 are both 4. Suppose that PS1 is determined in 3D space, which requires four z-coordinates. Then f1and f2 are also determined in 3D space. When the z-coordinates of three vertices on PS2 are known based on f2, one more z-coordinate of a vertex not coplanar with f2 on PS2 can determine PS2 in 3D space. Therefore, the DoF of the combined solid is 5. Fig. 2 is a typical example of two primitive solids gluing together along faces. Fig. 3(a) is an example of two primitive solids gluing together along edges. Two primitive solids may also connect at one vertex. The following property is easy to verify. 1The degree of a vertex is the number of edges connected to this vertex. 1435 Property 4. The DoF of a solid is 6 which is obtained by gluing two edges of two primitive solids. The DoF of a solid is 7 which is obtained by gluing two vertices oftwoprimitive solids. From the above properties, we can see that primitive solids are indeed the “smallest” solids in terms of DoF and they can serve as the building blocks to construct more complex solids. Therefore, our next target is to decompose a line drawing representing a complex solid into multiple line drawings representing primitive solids. Before giving Definition 3, we define some terms first. Vertex set of a face. The vertex set V er(f) of a face f is the set of all the vertices of f. Fixed vertex. A fixed vertex is one with its z-coordinate (thus its 3D coordinate) known. Unfixed vertex. An unfixed vertex is one with its zcoordinate unknown. Fixed face. A fixed face is one with its 3D position determined by its three fixed vertices. Unfixed face. An unfixed face is one with its 3D position undetermined. Definition 3. Let the vertex set and the face set of a line drawing be V = {v1, v2 , ..., vn} and F = {f1, f2 , ..., fm}, respectively, w =he {rve n and m are Fthe = n{fumbers of th}e, vertices and the faces, respectively. Also let Vfixed, Ffixed, Vunfixed, and Funfixed be the sets of fixed vertices, fixed faces, unfixed vertices, and unfixed faces, respectively. Suppose that an initial set of two fixed neighboring faces sharing an edge is Finitial with all their fixed vertices in Vinitial. The final Ffixed in Algorithm 1 is called the maximum extended face set (MEFS) from Finitial. In Algorithm 1, a face f that satisfies the condition in step 3 is a face that has been determined in 3D space by the current fixed vertices in Ffixed. When this face is found, it becomes a fixed face and all its vertices become fixed vertices. The DoF of the initial two fixed faces combined is 4. It is not difficult to see that the algorithm does not increase the initial DoF, and thus the final object represented by the MEFS also has DoF 4. Next, let us consider a simple example shown in Fig. 2(a) with the following three cases: Case 1. Suppose that Finitial = {(e, f,g, h) , (e, f,b, a)}, Vinitial = {e, f,g, h, b, a}, and th{e( algorithm a(de,dfs tbh,ea f}a,c Ves into Ffixed ,ing thh,isb aor}d,e ar:n (f th,e g, c, obr)i →m (a, b, c, d) → (e, h, d, a) →i ( tgh, hs, odr, dce)r. T (fhe,gn tch,eb )fin →al object ,fod)und → by t,hhe, algorithm (isg thh,ed c,uc)b.e. Note that the algorithm does not add any triangular faces into Ffixed because they do not satisfy the condition in step 3. Case 2. If Finitial = {(b, i,a) , (b, i,c)}, then the final object found is the pyramid, abn,id, tah)e, algorithm hdoenes t nhoet f iandadl any rectangular faces except (a, b, c, d) into Ffixed. Case 3. If Finitial = {(b, a, i) , (e, f,b, a)}, the algorithm cannot find any othe=r f a{(cebs,a at,oi a)d,(de ,tof Ffixed. tThheus al, gito rfaitihlsto find the cube or pyramid. Algorithm 1 Face extending procedure Initialization: F, F, Initialization: Funfixed = F \ Finitial , Ffixed = Finitial , Vfixed = Vinitial, Vunfixed = FV \ \ FVinitial. 1. do the following steps until no face satisfies the condition in step 3; 2. Find a face f ∈ Funfixed that satisfies 3. the number ofnon-collinear vertices in V er(f) ∩Vfixed is more than 2; 4. Add face f into Ffixed and delete it from Funfixed; 5. For each vertex v ∈ V er(f), if v ∈ Vunfixed, add v into Vfixed and delete it from Vunfixed; Return The final Ffixed. Fig.5(a)Ac(oam)plexinedrawingofn (-bm)anifolds id.(b)The decomposition result by our algorithm. In case 3, the object represented by the MEFS has only two initial faces and this object is discarded. In order not to miss a primitive solid, we run Algorithm 1 multiple times each with a different pair of neighboring faces in Finitial. Then, we can always have Finitial with its two faces from one primitive solid. For the object in Fig. 2(a), we can always find the cube and the pyramid. Note that the same primitive solid may be found multiple times from different Finitial, and finally we keep only one copy of each different object (cube and pyramid in this example). When a complex solid is formed by more than two primitive solids, Algorithm 1 can still be used to find the primitive solids, which is the decomposition result of the complex line drawing. More complex examples are given in Section 3. Besides, Algorithm 1 can also deal with complex solids formed by gluing primitive solids between edges and vertices. Fig. 5(a) is a solid constructed by gluing eight primitive solids between faces, edges, and vertices. Running Algorithm 1multiple times with different pairs of neighboring faces in Finitial generates the primitive solids as shown in Fig. 5(b). 2.2. Decomposing line drawings of general objects A general object can be a manifold, non-manifold solid, or non-solid. Given a line drawing representing a general object, it is unknown whether this object consists of only primitive solids. However, we can always apply Algorithm 1to the line drawing multiple times, each with a 1436 Obj6(4)O b j 15( 94)(ca)O b j 24(9 7)Obj3(7)(bd) Fig. 6. Illustration of our decomposition method. (a) A line drawing. (b) The set of MEFSs from (a). (c) The weighted objectcoexistence graph where the maximum weight clique is shown in bold. (d) The decomposition of (a). different pair of neighboring faces in Finitial, generating a set SMEFS of MEFSs (recall that an MEFS with only two initial neighboring faces is discarded). In what follows, we also call an MEFS an object, which is represented by the MEFS. Note that an MEFS generated from a general line drawing may not be a primitive solid, but its DoF must be 4. Objects of DoF 4 have relatively simple structures and are easy to be reconstructed. A number of decomposition examples of complex general line drawings can be seen from the experimental section. One issue existing in this decomposition method is that two different MEFSs may share many faces. For example, from the line drawing in Fig. 6(a), all different MEFSs found by running Algorithm 1multiple times are shown in Fig. 6(b), where Obj 1and Obj 5 share four faces, and so do Obj 2 and Obj 6. Obviously, Obj 5 and Obj 6 are not necessary. Next we define object coexistence and a rule to choose objects. Definition 4. Two objects are called coexistent if they share no face or share only coplanar faces. Rule 1. Choose a subset of SMEFS such that in the subset, all the objects are coexistent and the number of total faces is maximized. From Definition 4, Obj 1 and Obj 5 are not coexistent in Fig. 6, and Obj 2 and Obj 6 are not either. If Obj 5 and Obj 6 are kept with Obj 1and Obj 2 discarded, many faces in the original object will be missing. Rule 1guarantees that Obj 1and Obj 2 are kept but not Obj 5 and Obj 6. Algorithm 2 Decomposition of a general line drawing Algorithm 2 Decomposition of a general line drawing Input: A Line Drawing: G = (V,E,F). Initialization: SMEFS = ∅, SMWC = ∅. 1. for each pair of neighboring faces {fa , fb} in F do 2. Call Algorithm 1with Finitial = {fa , fb} and Vinitial = V er(fa) ∪ V er(fb); 3. if the returned Ffixed from Algorithm 1contains more than two faces do 4. SMEFS ← Ffixed; 5. Construct the object-coexistence graph Gobj with SMEFS ; 6. SMWC ← the maximum weight clique found from Gobj ; 7. for each face f not contained in SMWC do 8. Attach f to the object in SMWC that contains the maximum number of the vertices of f; Return SMWC. Fig.7 (a)Ashe tobjec(ta)with23faces.(b)Decompositon(br)esult by Algorithm 2 with the modification in Algorithm 1. We formulate Rule 1 as a maximum weight clique problem (MWCP), which is to find a clique2 of the maximum weight from a weighted graph. First, we construct a weighted graph, called the object-coexistence graph, in which a vertex denotes an object in SMEFS and there is an edge connecting two vertices if the two objects represented by the two vertices are coexistent. Besides, each vertex is assigned a weight equal to the number of the faces of the corresponding object. The MWCP is a well-known NP-hard problem. In our application, however, solving this problem is fast enough since an object-coexistence graph usually has less than 20 objects (vertices). We use the algorithm in [12] to deal with this problem. Fig. 6(c) is the object-coexistence graph constructed from the six objects in Fig. 6(b), where the weights of the vertices are denoted by the numbers in the parentheses. The maximum weight clique is shown in bold. From Fig. 6, we see that the face (14, 13, 26, 25) is not contained in SWMC, which is used to store the objects in the maximum weight clique. This face is finally attached to Obj 3. In general, each of the faces not in SWMC is attached to an object that contains the maximum number of the vertices of this face. If there are two or more objects that contain the same number of the vertices of this face, this face is assigned to any of them. 2A clique is a subgraph of a graph such that subgraph are connected by an edge. every two vertices in the 1437 Algorithm 2 shows the complete algorithm to decompose a general line drawing. Steps 7 and 8 attach the faces not in SMWC to some objects in SMWC. A common complex object usually consists of primitive solids and sheets, and Algorithm 2 works well for the decomposition of most complex line drawings. However, there are still some line drawings the algorithm cannot deal with. Such an example is shown in Fig. 7(a) which is a sheet object with 23 faces. In Algorithm 1, with any pair of initial neighboring faces, there is no any other face satisfying the condition in step 3, thus no object of DoF 4 will be found. The following scheme can solve this problem. Given a line drawing, steps 1–6 in Algorithm 2 are used to decompose it into multiple objects of DoF 4. If there are separate groups of faces not in SMWC, where the faces in each group are connected, then attach the groups each with less than four faces to some objects in SMWC3 (the attachment method is similar to steps 7 and 8 in Algorithm 2). For a group with four or more connected faces, Algorithm 2 is applied to it with a minor modification in Algorithm 1. The modification is to set Finitial to contain three connected faces whose combined DoF is 5. This modification allows the search of objects of DoF 5. Suppose the object in Fig. 7(a) is such a group. Applying Algorithm 2 to it with the minor modification generates the decomposition result as shown in Fig. 7(b). 2.3. 3D Reconstruction A complex line drawing can be decomposed into several simpler ones using the method proposed in Sections 2. 1 and 2.2. The next step is to reconstruct a 3D object from each ofthem, which is an easy task because the decomposed line drawings are simple. The method in [6] or [7] can carry out this task very well. We use the one in [6] for our work with the objective function Φ(z1 , z2 , ..., zNv ) constructed by these five image regularities: MSDA, face planarity, line parallelism, isometry, and corner orthogonality. The details of the regularities can be found from [6]. After obtaining the 3D objects from all the decomposed line drawings, the next step is to merge them to form one complex object. When merging two 3D objects, since they are reconstructed separately, the gluing parts (face or edge) of them are usually not of the same size. Then one object is automatically rescaled according to the sizes of the two gluing parts, and the vertices of the gluing part of this object are also adjusted so that the two parts are the same. After merging all the 3D objects, the whole object is fine-tuned by minimizing the objective function Φ on the object. We can also apply our method to reconstruct 3D shapes from objects in images. First, the user draws a line drawing along the visible edges of an object and he/she can also 3The reason to attach a group with less than four faces to an object in SMWC is that this group is small and is not necessary to be an independent object to reconstruct. guess (draw) the hidden edges. Then from this line drawing, our approach described above reconstructs the 3D geometry of the object in the image. 3. Experimental Results In this section, we show a number of complex 3D reconstruction examples from both line drawings and images to demonstrate the performance of our approach. The first set of experiments in Fig. 8 has nine complex line drawings. Fig. 8(a) is a manifold, and the others are nonmanifold solids or non-solids. The decompositions of the line drawings are also given in the figure, from which we can see that the results are in accordance with our visual perception very well. All the primitive solids are found by our algorithm. It is the successful decompositions that make the 3D reconstructions from these complex line drawings possible. The expected satisfactory reconstruction results are shown also in Fig. 8 each in two views. Fig. 9 shows another set of 3D reconstructions from objects in images with line drawings drawn on the objects. The decomposition results are omitted due to the space limitation. Each reconstruction result obtained by our algorithm is shown in two views with the texture from the image mapped onto the surface. We can see that the results are very good. The details of the objects and the line drawings can be shown by enlarging the figures on the screen. Among all the previous algorithms for general object reconstruction, the one in [7] can deal with most complex objects. Due to the local minimum problem in a high dimensional search space, however, this algorithm cannot handle line drawings as complex as those in Figs. 8 and 9. For example, Fig. 10(a) shows its reconstruction result from the line drawing in Fig. 8(c), which is a failure. The reader may wonder what happens if the 3D reconstruction is based on an arbitrary decomposition of a complex line drawing, instead of the proposed one. Fig. 10(b) shows such a decomposition from Fig. 8(c). Based on this decomposition, the 3D reconstruction result obtained by the scheme described in Section 2.3 is given in Fig. 10(c), which is a failure. The failure is caused by two reasons: (i) An arbitrary decomposition usually does not generate common objects, which makes the image regularities less meaningful for the 3D reconstruction. (ii) The gluing of 3D objects from the decomposition in Fig. 10(b) is difficult because of the irregular touches between the objects. The fine-tuning processing (see Section 2.3) cannot reduce the large distortion to an acceptable result. Note that since our algorithm is not limited to manifolds, it can deal with line drawings with some or without hidden lines. The third line drawing in Fig. 9 is an example where some hidden lines are not drawn. Most of the line drawings in this paper look tidy. This 1438 (g)(h)(i) Fig. 8. Nine complex line drawings, their decompositons, and 3D reconstruction results in two views wher dif er nt col rs are used to denote the faces (better viewed on the screen). Fig.9 Fourimages,thecorespondi glinedrawings,andther constructed3Dobjectswith exturemap ed,eachs owni twoviews. The details can be seen by enlarging the figures on the screen. 1439 Fig.10(.a( ) Afailedreconstru(cb)tionbythealgorithm(ci)n[7].(b)An Fig.1 .(a ) Alinedrawingwith(bs)trongsketchingero(sc.)(b)(c) arbitrary decomposition of the line drawing in Fig. 8(c) without using our decomposition method. (c) Failed 3D reconstruction based on the decomposition in (b). Two views ofthe successful reconstruction result by our algorithm. is for easy observation of the objects. In fact, our algorithm is not sensitive to sketching errors. Take Fig. 8(a) as an example and assume it is an accurate projection of the 3D object. Then, random variations are generated with the Gaussian distribution N(0, σ2) on the 2D locations of the vertices. Fig. 11(a) is a resulting noisy line drawing with σ = W/200 where W is the width of the line drawing in Fig. 8(a). From Fig. 11, we see that even for this line drawing with strong sketching errors, our algorithm can still obtain the good reconstruction result. Our algorithm is implemented in C++. The computational time includes two parts: line drawing decomposition and 3D reconstruction. The main computation is consumed by the second part. On average, a common PC takes about one minute to obtain the reconstruction from each of the line drawings in Figs. 8 and 9. 4. Conclusion Previous algorithms of 3D object reconstruction from line drawings either deal with simple general objects or are limited to only manifolds (a subset of solids). In this paper, we have proposed a novel approach that can handle complex general objects, including manifolds, nonmanifold solids, and non-solids. It decomposes a complex line drawing into simpler ones according to the degree of freedom of objects, which is based on the developed 3D object properties. After 3D objects are reconstructed from the decomposed line drawings, they are merged to form a complex object. We have shown a number of reconstruction examples with comparison to the best previous algorithm. The results indicate that our algorithm can tackle much more complex line drawings of general objects and is insensitive to sketching errors. The future work includes (i) the correction of the distortions of 3D objects reconstructed from images caused by the perspective projection, and (ii) the extension of this work to objects with curved faces. Acknowledgements This work was supported by grants from Natural Science Foundation of China (No. try, Trade, and Information Shenzhen Municipality, and Guangdong Science, Technology Commission China (No. Innovative 201001D0104648280). 61070148), Indusof JC201005270378A), Research Team Program (No. Jianzhuang Liu is the correspond- ing author. References [1] M. Clowes. On seeing things. Artificial Intelligence, 2:79–1 16, 1971. [2] P. Debevec, C. Taylor, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. Proc. ACM SIGGRAPH, pages 11–20, 1996. [3] D. Jelinek and C. Taylor. Reconstruction of linearly parameterized models from single images with a camera of unknown focal length. IEEE T-PAMI, 23(7):767–773, 2001. [4] D. E. LaCourse. Handbook of Solid Modeling. McGraw-Hill, 1995. [5] Y. Leclerc and M. Fischler. An optimization-based approach to the interpretation of single line drawings as 3D wire frames. IJCV, 9(2): 113–136, 1992. [6] H. Lipson and M. Shpitalni. Optimization-based reconstruction of a 3d object from a single freehand line drawing. Computer-Aided Design, 28(7):651–663, 1996. [7] J. Liu, L. Cao, Z. Li, and X. Tang. Plane-based optimization for 3D object reconstruction from single line drawings. IEEE T-PAMI, 30(2):315–327, 2008. [8] J. Liu, Y. Chen, and X. Tang. Decomposition of complex line drawings with hidden lines for 3d planar-faced manifold object [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] reconstruction. IEEE T-PAMI, 33(1):3–15, 2011. J. Liu, Y. Lee, and W. Cham. Identifying faces in a 2D line drawing representing a manifold object. IEEE T-PAMI, 24(12): 1579–1593, 2002. J. Liu and X. Tang. Evolutionary search for faces from line drawings. IEEE T-PAMI, 27(6):861–872, 2005. T. Marill. Emulating the human interpretation of line-drawings as three-dimensional objects. IJCV, 6(2): 147–161, 1991 . P. R. J. O¨sterg a˚rd. A new algorithm for the maximum-weight clique problem. Nordic J. of Computing, 8(4):424–436, Dec. 2001 . H. Shimodaira. A shape-from-shading method of polyhedral objects using prior information. IEEE T-PAMI, 28(4):612–624, 2006. I. Shimshoni and J. Ponce. Recovering the shape of polyhedra using line-drawing analysis and complex reflectance models. Computer Vision and Image Understanding, 65(2):296–3 10, 1997. K. Shoji, K. Kato, and F. Toyama. 3-d interpretation of single line drawings based on entropy minimization principle. CVPR, 2001. M. Shpitalni and H. Lipson. Identification of faces in a 2d line drawing projection of a wireframe object. IEEE T-PAMI, 18(10), 1996. K. Sugihara. Machine interpretation of line drawings. MIT Press, 1986. A. Turner, D. Chapman, and A. Penn. Sketching space. Computer and Graphics, 24:869–879, 2000. F. Ulupinar and R. Nevatia. Shape from contour: straight homogeneous generalized cylinders and constant cross-section generalized cylinders. IEEE T-PAMI, 17(2): 120–135, 1995. T. Xue, J. Liu, and X. Tang. Example-based 3d object reconstruction from line drawings. CVPR, 2012. 1440
same-paper 2 0.74345601 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection
Author: Cheng Li, Kris M. Kitani
Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
3 0.73335755 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
4 0.71885705 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading
Author: Yuru Pei, Tae-Kyun Kim, Hongbin Zha
Abstract: Lipreading from visual channels remains a challenging topic considering the various speaking characteristics. In this paper, we address an efficient lipreading approach by investigating the unsupervised random forest manifold alignment (RFMA). The density random forest is employed to estimate affinity of patch trajectories in speaking facial videos. We propose novel criteria for node splitting to avoid the rank-deficiency in learning density forests. By virtue of the hierarchical structure of random forests, the trajectory affinities are measured efficiently, which are used to find embeddings of the speaking video clips by a graph-based algorithm. Lipreading is formulated as matching between manifolds of query and reference video clips. We employ the manifold alignment technique for matching, where the L∞norm-based manifold-to-manifold distance is proposed to find the matching pairs. We apply this random forest manifold alignment technique to various video data sets captured by consumer cameras. The experiments demonstrate that lipreading can be performed effectively, and outperform state-of-the-arts.
5 0.71047604 10 iccv-2013-A Framework for Shape Analysis via Hilbert Space Embedding
Author: Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi
Abstract: We propose a framework for 2D shape analysis using positive definite kernels defined on Kendall’s shape manifold. Different representations of 2D shapes are known to generate different nonlinear spaces. Due to the nonlinearity of these spaces, most existing shape classification algorithms resort to nearest neighbor methods and to learning distances on shape spaces. Here, we propose to map shapes on Kendall’s shape manifold to a high dimensional Hilbert space where Euclidean geometry applies. To this end, we introduce a kernel on this manifold that permits such a mapping, and prove its positive definiteness. This kernel lets us extend kernel-based algorithms developed for Euclidean spaces, such as SVM, MKL and kernel PCA, to the shape manifold. We demonstrate the benefits of our approach over the state-of-the-art methods on shape classification, clustering and retrieval.
6 0.70790023 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
7 0.67100894 414 iccv-2013-Temporally Consistent Superpixels
8 0.66967195 150 iccv-2013-Exemplar Cut
9 0.66582447 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
10 0.66148335 396 iccv-2013-Space-Time Robust Representation for Action Recognition
11 0.66044146 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
12 0.66003054 379 iccv-2013-Semantic Segmentation without Annotating Segments
13 0.65934598 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
14 0.65932202 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
15 0.65900654 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
16 0.65830886 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
17 0.65812039 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
18 0.65807641 6 iccv-2013-A Convex Optimization Framework for Active Learning
19 0.65725201 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
20 0.6567508 258 iccv-2013-Low-Rank Sparse Coding for Image Classification