nips nips2010 nips2010-272 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen
Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. [sent-7, score-0.94]
2 Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. [sent-8, score-0.131]
3 It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. [sent-9, score-0.154]
4 We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. [sent-11, score-0.509]
5 Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. [sent-12, score-0.281]
6 In the domain of scene understanding for example, several independent efforts have resulted in good classifiers for tasks such as scene categorization, depth estimation, object detection, etc. [sent-16, score-1.079]
7 In practice, we see that these sub-tasks are coupled—for example, if we know that the scene is indoors, it would help us estimate depth more accurately from that single image. [sent-17, score-0.517]
8 In another example in the robotic grasping domain, if we know what object it is, then it is easier for a robot to figure out how to pick it up. [sent-18, score-0.549]
9 Recently, several approaches have tried to combine these different classifiers for related tasks in vision [19, 25, 35]; however, most of them tend to be ad-hoc (i. [sent-21, score-0.204]
10 , a hard-coded rule is used) and often intimate knowledge of the inner workings of the individual classifiers is required. [sent-23, score-0.154]
11 [17] recently developed a framework for scene understanding called Cascaded Classification Models (CCM) treating each classifier as a ‘black-box’. [sent-28, score-0.325]
12 Each classifier is repeatedly instantiated with the next layer using the outputs of the previous classifiers as inputs. [sent-29, score-0.278]
13 This feedback can help the CCM achieve a more optimal solution. [sent-31, score-0.161]
14 In our work, we propose Feedback Enabled Cascaded Classification Models (FE-CCM), which provides feedback from the later classifiers to the earlier ones, during the training phase. [sent-32, score-0.225]
15 This feedback, provides earlier stages information about what error modes should be focused on, or what can be ignored without hurting the performance of the later classifiers. [sent-33, score-0.193]
16 For example, misclassifying a street scene as highway would not hurt as much as misclassifying a street scene as open country. [sent-34, score-0.72]
17 Therefore we prefer the first layer classifier to focus on fixing the latter error instead of optimizing the training accuracy. [sent-35, score-0.229]
18 In another example, allowing the depth estimation to focus on some specific regions can help perform better scene categorization. [sent-36, score-0.517]
19 For instance, the open country scene is characterized by its upper part as a wide sky area. [sent-37, score-0.333]
20 Therefore, to estimate the depth well in that region by sacrificing some regions in the bottom may help an image to be categorized to the correct category. [sent-38, score-0.325]
21 In detail, we do so by jointly maximizing the likelihood of all the tasks; the outputs of the first layers are treated as latent variables and training is done by using an iterative algorithm. [sent-39, score-0.219]
22 Therefore, our method is applicable to many tasks that have different but correlated outputs. [sent-47, score-0.131]
23 2 Related Work The idea of using information from related tasks to improve the performance of the task in question has been studied in various fields of machine learning and vision. [sent-54, score-0.131]
24 The idea of cascading layers of classifiers to aid the final task was first introduced with neural networks as multi-level perceptrons where, the output of the first layer of perceptrons is passed on as input to the next hidden layer [16, 12, 6]. [sent-55, score-0.492]
25 However, in our scenario, we consider multiple tasks where each classifier is tackling a different problem (i. [sent-60, score-0.131]
26 The idea of improving classification performance by combining outputs of many classifiers is used in methods such as Boosting [13], where many weak learners are combined to obtain a more accurate classifier; this has been applied tasks such as face detection [4, 40]. [sent-63, score-0.418]
27 Tu [39] used pixel-level label maps to learn a contextual model for pixel-level labeling, through a cascaded classifier approach, but such works considered only the interactions between labels of the same type. [sent-65, score-0.221]
28 Kumar and Hebert [23] developed a large MRF-based probabilistic model to link multi-class segmentation and object detection. [sent-67, score-0.183]
29 The model optimizes the output of each Classif ierj on the second stage independently; (b) Proposed Feed-back enabled cascaded classification model (FE-CCM), where there is feed-back from the latter stages to help achieve a model which optimizes all the tasks considered, jointly. [sent-74, score-0.626]
30 manually designed the terms in an MRF to combine depth estimation with object detection [34] and stereo cues [33]. [sent-84, score-0.526]
31 [35] used object recognition to help 3D structure estimation. [sent-86, score-0.188]
32 [19] proposed an innovative but ad-hoc system that combined boundary detection and surface labeling by sharing some low-level information between the classifiers. [sent-89, score-0.228]
33 However, these methods required considerable attention to each classifier, and considerable insight into the inner workings of each task and also the connections between tasks. [sent-93, score-0.154]
34 Our cascade is composed of two layers, where the outputs from classifiers on the first layer go as input into the classifiers in the second layer. [sent-112, score-0.331]
35 We do this by appending all the outputs from the first layer to the features for that task. [sent-113, score-0.278]
36 , Zn (output of layer 1 and input to layer 2) are hidden, and this makes training of each classifier as a black-box hard. [sent-137, score-0.425]
37 [17] assume that each layer is independent and that each layer produces the best output independently (without consideration for other layers), and therefore use the ground-truth labels for Z1 , Z2 , . [sent-139, score-0.497]
38 , the first layer classifiers need not perform their best (w. [sent-145, score-0.196]
39 , Zn ; ωi ) ωi (6) X∈Γ maximize θi (5) log P (Zi |Ψi (X); θi ) maximize X∈Γ Note that the optimization problem nicely breaks down into the sub-problems of training the individual classifier for the respective sub-tasks. [sent-196, score-0.152]
40 Therefore, we consider the original classifiers as black-box and we do not need any low level information about the particular tasks or knowledge of the inner workings of the classifier. [sent-199, score-0.285]
41 All depth maps in depth estimation are at the same scale (black means near and white means far); Salient region in saliency detection are indicated in cyan; Geometric labeling - Green = Support, Blue = Sky and Red = Vertical (Best viewed in color). [sent-246, score-0.783]
42 , Yn |X) corresponds to performing inference over the first layer (using the same inference techniques for the respective black-box classifiers), followed by inference on the second layer. [sent-252, score-0.251]
43 With a large number of sub-tasks, the number of the weights in the second layer increases, and our sparsity term results in a few non-zero connections between sub-tasks that are active. [sent-255, score-0.227]
44 4 Scene Understanding: Implementation Here we briefly describe the implementation details for our instantiation of FE-CCMs for scene understanding. [sent-274, score-0.3]
45 In our preliminary work [22], where we optimized for each target task independently, we considered four vision tasks: scene categorization, depth estimation, event categorization and saliency detection. [sent-278, score-0.858]
46 In this work, we add object detection and geometric labeling, and jointly optimize all six tasks. [sent-280, score-0.447]
47 For scene categorization, we classify an image into one of the 8 categories defined by Torralba et. [sent-282, score-0.338]
48 We define the output of a scene classifier to be a 8-dimensional vector with each element representing the score for each category. [sent-285, score-0.366]
49 We evaluate the performance by measuring the accuracy of assigning the correct scene label to an image on the MIT outdoor scene dataset [28]. [sent-286, score-0.714]
50 For the single image depth estimation task, we want to estimate the depth d ∈ R+ of every pixel in an image (Figure 2a). [sent-288, score-0.605]
51 We evaluate the performance of the estimation by computing the root mean square error of the estimated depth with respect to ground truth laser scan depth using the Make3D Range Image dataset [30, 31]. [sent-289, score-0.457]
52 For event categorization, we classify an image into one of the 8 sports events as defined by Li et. [sent-293, score-0.153]
53 We define the output of a event classifier to be a 8-dimensional vector with each element representing the log-odds score for each category. [sent-296, score-0.181]
54 For evaluation, we compute the accuracy assigning the correct event label to an image. [sent-297, score-0.156]
55 Here, we want to classify each pixel in the image to be either salient or nonsalient (Figure 2c). [sent-299, score-0.179]
56 We define the output of the classifier as a scalar indicating the saliency confidence score of each pixel. [sent-300, score-0.234]
57 We threshold this saliency score to determine whether the point is salient (+1) or not (−1). [sent-301, score-0.239]
58 For evaluation, we compute the accuracy of assigning a pixel as a salient point. [sent-302, score-0.182]
59 We consider the following object categories: car, person, horse and cow. [sent-304, score-0.18]
60 A sample image with the object detections is shown in Figure 2b. [sent-305, score-0.22]
61 Our object detection module builds on the part-based detector of Felzenszwalb et. [sent-307, score-0.278]
62 We then extract HOG features [7] on every candidate window and learn a RBF-kernel SVM model as the first layer classifier. [sent-311, score-0.196]
63 The classifier assigns each window a +1 or −1 label indicating whether the window belongs to the object or not. [sent-312, score-0.147]
64 For the second-layer classifier, we learn a logistic model over the feature vector constituted by the outputs of all first-level tasks and the original HOG feature. [sent-313, score-0.248]
65 The geometric labeling task refers to assigning each pixel to one of three geometric classes: support, vertical and sky (Figure 2d), as defined by Hoiem et. [sent-316, score-0.443]
66 We use the dataset and the algorithm by [20] as the first-layer geometric labeling module. [sent-319, score-0.23]
67 For the second-layer, we learn a logistic model over the a feature vector which is constituted by the outputs of all first-level tasks and the features used in the first layer. [sent-321, score-0.248]
68 For evaluation, we compute the accuracy of assigning the correct geometric label to a pixel. [sent-322, score-0.174]
69 We evaluate our proposed method on two different domains: scene understanding and robotic grasping. [sent-325, score-0.444]
70 We do not do cross-validation on object detection as it is standard on the PASCAL 2006 [9] dataset (1277 train and 2686 test images respectively). [sent-441, score-0.313]
71 Results and discussion: To quantitatively evaluate our method for each of the sub-tasks, we consider the metrics appropriate to each of the six tasks in Section 4. [sent-442, score-0.169]
72 Table 1 shows that FE-CCM not only beats state of art in all the tasks but also does it jointly as one single unified model. [sent-443, score-0.164]
73 The state-of-the-art classifiers improve on the base model by explicitly hand-designing the task specific probabilistic model [24, 31] or by using adhoc methods to implicitly use information from other tasks [20]. [sent-445, score-0.181]
74 Furthermore, Table 1 shows the results for CCM (which is a cascade without feedback information) and all-features-direct (which uses features from all the tasks). [sent-448, score-0.173]
75 In comparison to CCM, FE-CCM leads to better depth estimation of the sky and the ground, and it leads to better coverage and accurate labeling of the salient region in the image, and it also leads to better geometric labeling and object detection. [sent-451, score-0.787]
76 FE-CCM allows each classifier in the second layer to learn which information from the other firstlayer sub-tasks is useful in the form of weights (in contrast to manually using the information shared across sub-tasks in some prior works). [sent-453, score-0.227]
77 We provide a visualization of the weights for the 6 vision tasks in Figure 3-left. [sent-454, score-0.198]
78 Figure 3-right provides a closer look to the positive weights given to the various outputs for a secondlevel geometric classifier. [sent-458, score-0.211]
79 2 Robotic Grasping In order to show the applicability of our FE-CCM to problems across different machine learning experiments, we also considered the problem of a robot autonomously grasping objects. [sent-463, score-0.283]
80 Given an image and a depthmap, the goal of the learning algorithm is to select a point at which to grasp the 7 Figure 3: (Left) The absolute values of the weight vectors for second-level classifiers, i. [sent-464, score-0.168]
81 Each column shows the contribution of the various tasks towards a certain task. [sent-467, score-0.165]
82 (Note: Blue is low and Red is high) object (this location is called grasp point, [32]). [sent-469, score-0.242]
83 It turns out that different categories of objects could have different strategies for grasping, and therefore in this work, we use our FE-CCM to combine object classification and grasping point detection. [sent-470, score-0.411]
84 [32] which spans 6 object categories and also includes an aligned pixel level depth map for each image. [sent-473, score-0.395]
85 For grasp point detection, we use a regression over features computed from the image [32]. [sent-474, score-0.168]
86 The output of the regression is a score for each point giving the confidence of the point being a good grasping point. [sent-475, score-0.328]
87 For object detection, we use a logistic classifier to perform the classification. [sent-476, score-0.147]
88 Table 2 shows the results for our algorithm’s ability to predict the grasping point, given an image and the depths observed by the robot using its sensors. [sent-479, score-0.356]
89 Figure 4 show our robot grasping an object using our algorithm. [sent-481, score-0.43]
90 Table 2: Summary of results for the the robotic grasping experiment. [sent-482, score-0.346]
91 7 Figure 4: Our robot grasping an object using our algorithm. [sent-495, score-0.43]
92 We only consider the individual classifiers as a “black-box” (thus not needing to know the inner workings of the classifier) and propose learning techniques for combining them (thus not needing to know how to combine the tasks). [sent-497, score-0.341]
93 Our method introduces feedback in the training process from the later stage to the earlier one, so that a later classifier can provide the earlier classifiers information about what error modes to focus on, or what can be ignored without hurting the joint performance. [sent-498, score-0.463]
94 We consider two domains: scene understanding and robotic grasping. [sent-499, score-0.444]
95 We believe that this is a small step towards holistic scene understanding. [sent-503, score-0.373]
96 We thank Anish Nahar, Matthew Cong and Colin Ponce for help with the robotic experiments. [sent-505, score-0.16]
97 Cascaded classification models: Combining models for holistic scene understanding. [sent-624, score-0.339]
98 A generic model to compose vision modules for holistic scene understanding. [sent-661, score-0.375]
99 Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. [sent-679, score-0.301]
100 Make3d: Learning 3d scene structure from a single still image. [sent-741, score-0.265]
wordName wordTfidf (topN-words)
[('ers', 0.268), ('classi', 0.265), ('scene', 0.265), ('ccm', 0.26), ('grasping', 0.227), ('depth', 0.211), ('layer', 0.196), ('saxena', 0.196), ('cascaded', 0.18), ('er', 0.149), ('object', 0.147), ('saliency', 0.133), ('categorization', 0.133), ('detection', 0.131), ('tasks', 0.131), ('feedback', 0.12), ('robotic', 0.119), ('workings', 0.114), ('geometric', 0.098), ('zn', 0.097), ('labeling', 0.097), ('grasp', 0.095), ('enabled', 0.095), ('zi', 0.093), ('classif', 0.087), ('eri', 0.087), ('outputs', 0.082), ('event', 0.08), ('uni', 0.079), ('heitz', 0.077), ('hoiem', 0.077), ('holistic', 0.074), ('image', 0.073), ('cvpr', 0.069), ('salient', 0.069), ('sky', 0.068), ('hurting', 0.065), ('ieri', 0.065), ('output', 0.064), ('understanding', 0.06), ('yn', 0.058), ('needing', 0.057), ('modes', 0.056), ('robot', 0.056), ('ijcv', 0.055), ('respective', 0.055), ('cascade', 0.053), ('domains', 0.052), ('base', 0.05), ('deep', 0.05), ('monocular', 0.049), ('stage', 0.045), ('assigning', 0.045), ('pami', 0.045), ('achanta', 0.043), ('misclassifying', 0.043), ('tall', 0.043), ('tsuhan', 0.043), ('help', 0.041), ('labels', 0.041), ('inner', 0.04), ('yi', 0.039), ('li', 0.038), ('six', 0.038), ('kowdle', 0.038), ('highway', 0.038), ('coast', 0.038), ('face', 0.038), ('pixel', 0.037), ('score', 0.037), ('combine', 0.037), ('earlier', 0.037), ('layers', 0.036), ('vision', 0.036), ('combining', 0.036), ('segmentation', 0.036), ('initialization', 0.036), ('felzenszwalb', 0.036), ('latent', 0.035), ('implementation', 0.035), ('dataset', 0.035), ('seamlessly', 0.035), ('datapoint', 0.035), ('testset', 0.035), ('constituted', 0.035), ('optimizes', 0.035), ('later', 0.035), ('towards', 0.034), ('jointly', 0.033), ('street', 0.033), ('training', 0.033), ('horse', 0.033), ('intricate', 0.033), ('sailing', 0.033), ('hard', 0.032), ('maximize', 0.032), ('torralba', 0.032), ('weights', 0.031), ('girshick', 0.031), ('accuracy', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen
Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1
2 0.2661767 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing
Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.
3 0.2254048 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
5 0.2146039 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
6 0.15563497 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
7 0.15073998 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering
8 0.15027778 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
9 0.14390692 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
10 0.13788667 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
11 0.13421389 228 nips-2010-Reverse Multi-Label Learning
12 0.13169043 192 nips-2010-Online Classification with Specificity Constraints
13 0.12044776 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
14 0.1168144 151 nips-2010-Learning from Candidate Labeling Sets
15 0.1100354 15 nips-2010-A Theory of Multiclass Boosting
16 0.110029 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
17 0.10986864 149 nips-2010-Learning To Count Objects in Images
18 0.10729864 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
19 0.10572756 59 nips-2010-Deep Coding Network
20 0.10397749 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
topicId topicWeight
[(0, 0.264), (1, 0.135), (2, -0.157), (3, -0.353), (4, 0.02), (5, 0.011), (6, -0.159), (7, 0.012), (8, -0.022), (9, 0.023), (10, 0.022), (11, 0.068), (12, 0.01), (13, 0.045), (14, 0.022), (15, 0.043), (16, 0.096), (17, 0.062), (18, -0.042), (19, -0.032), (20, -0.042), (21, -0.054), (22, -0.12), (23, 0.076), (24, 0.0), (25, 0.04), (26, -0.02), (27, 0.107), (28, 0.002), (29, 0.004), (30, -0.067), (31, -0.05), (32, 0.029), (33, -0.03), (34, -0.066), (35, 0.002), (36, -0.007), (37, 0.16), (38, -0.108), (39, 0.012), (40, 0.003), (41, -0.002), (42, 0.072), (43, -0.019), (44, 0.132), (45, -0.095), (46, -0.034), (47, -0.039), (48, -0.025), (49, 0.007)]
simIndex simValue paperId paperTitle
same-paper 1 0.96415412 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen
Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1
2 0.81669742 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing
Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
4 0.70204866 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei
Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1
5 0.70172668 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
6 0.68469453 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
7 0.65700006 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
8 0.61186445 228 nips-2010-Reverse Multi-Label Learning
9 0.58454913 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
10 0.56896293 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
11 0.56481761 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
12 0.55555052 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
13 0.54622167 99 nips-2010-Gated Softmax Classification
14 0.53564233 149 nips-2010-Learning To Count Objects in Images
15 0.51617813 15 nips-2010-A Theory of Multiclass Boosting
16 0.50601143 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
17 0.50496566 271 nips-2010-Tiled convolutional neural networks
18 0.49534631 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
19 0.48686165 192 nips-2010-Online Classification with Specificity Constraints
20 0.48514548 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
topicId topicWeight
[(13, 0.033), (17, 0.014), (27, 0.046), (30, 0.037), (35, 0.011), (45, 0.198), (50, 0.06), (52, 0.024), (60, 0.014), (77, 0.032), (78, 0.019), (90, 0.453)]
simIndex simValue paperId paperTitle
1 0.87761319 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
Author: Malik Magdon-Ismail
Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.
same-paper 2 0.87306821 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen
Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1
3 0.83752739 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing
Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.
4 0.83574992 250 nips-2010-Spectral Regularization for Support Estimation
Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo
Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.
5 0.74146849 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
Author: Han Liu, Xi Chen
Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1
6 0.71844196 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics
7 0.65772301 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
9 0.62729394 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
10 0.59697658 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
11 0.59569055 282 nips-2010-Variable margin losses for classifier design
12 0.59513009 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
13 0.58289778 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
14 0.58027136 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
15 0.57698035 117 nips-2010-Identifying graph-structured activation patterns in networks
16 0.57397413 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
17 0.57244295 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
18 0.56805342 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
19 0.56199503 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
20 0.55900323 243 nips-2010-Smoothness, Low Noise and Fast Rates