nips nips2010 nips2010-241 knowledge-graph by maker-knowledge-mining

241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata


Source: pdf

Author: Mario Fritz, Kate Saenko, Trevor Darrell

Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. [sent-2, score-0.313]

2 1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. [sent-5, score-0.58]

3 We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. [sent-6, score-0.681]

4 We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. [sent-7, score-0.556]

5 Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. [sent-8, score-0.628]

6 1 Introduction Two themes dominate recent progress towards situated visual object recognition. [sent-9, score-0.455]

7 Most significantly, the availability of large scale image databases and machine learning methods has driven performance: accuracy on many category detection tasks is a function of the quantity and quality of the available training data. [sent-10, score-0.448]

8 In this paper, we propose a method to bridge this gap and extract features from typical 2D data sources that can enhance recognition performance when 3D information is available at test time. [sent-20, score-0.314]

9 1 Figure 1: Recovery of object size from known camera intrinsics The paradigm of recognition-by-local-features has been well established in the computer vision literature in recent years. [sent-21, score-0.654]

10 Existing recognition schemes are designed generally to be invariant to scale and size. [sent-22, score-0.253]

11 , 3-D shape context and SIFT [4, 3]), but we are somewhat skeptical of the ability of even the most recent 3-D sensor systems to extract the detailed local geometry required to reliably detect and describe local 3-D shapes on real world objects. [sent-27, score-0.305]

12 1D” local feature model which augments a traditional 2D local feature (SIFT, GLOH, SURF, etc. [sent-29, score-0.336]

13 ) with an estimate of the depth and 3-D size of an observed patch. [sent-30, score-0.359]

14 on a full-size computer keyboard; while the keys might look locally similar, the absolute patch size would be highly distinctive. [sent-32, score-0.26]

15 We focus on the recognition of realworld objects when additional sensors are available at test time, and show how 2. [sent-33, score-0.349]

16 1D information can be extracted from monocular metadata already present in many online images. [sent-34, score-0.517]

17 Our model includes both a representation of the absolute size of local features, and of the overall dimension of categories. [sent-35, score-0.229]

18 We recover the depth and size of the local features, and thus of the bounding box of a detected object in 3-D. [sent-36, score-1.048]

19 Efficient search is an important goal, and we show a novel extension to multi-class branch-and-bound search using explicit metric 3-D constraints. [sent-37, score-0.407]

20 1D” features The crux of our method is the inference and exploitation of size information; we show that we can obtain such measurements from non-traditional sources that do not presume a 3-D scanner at training time, nor rely on multi-view reconstruction / structure-from-motion methods. [sent-39, score-0.287]

21 We instead exploit cues that are readily available in many monocular camera images. [sent-40, score-0.486]

22 1 We are not interested in reconstructing the object surface, and only estimate the absolute size of local patches, and the statistics of the bounding box of instances in the category; from these quantities we can infer the category size. [sent-41, score-0.972]

23 We adopt a local-feature based recognition model and augment it with metric size information. [sent-42, score-0.356]

24 While there are several possible local feature recognition schemes based on sets of such local features, we focus on the Naive Bayes nearest-neighbor model of [1] because of its simplicity and good empirical results. [sent-43, score-0.419]

25 We assume one or more common local feature descriptors (and associated detectors or dense sampling grids): SIFT, SURF, GLOH, MSER. [sent-44, score-0.244]

26 Our emphasis in this paper is on 1 There are a number of general paradigms by which estimates of object size can be extracted from a 2D image data source, e. [sent-45, score-0.411]

27 , [8]), present as camera intrinsics stored as metadata in the JPEG EXIF file format. [sent-53, score-0.602]

28 Images collected by many modern consumer-grade digital SLR cameras automatically store absolute distance-to-subject as metadata in the JPEG image. [sent-54, score-0.41]

29 2 Figure 2: Illustration of metric object size derived from image metadata stored in EXIF fields on an image downloaded from Flickr. [sent-55, score-1.01]

30 Absolute size is estimated by projecting bounding box of local features on object into 3-D using EXIF camera intrinsics stored in image file format. [sent-57, score-1.277]

31 improving the accuracy of recognizing categories that are at least approximately well modeled with such-local feature schemes; size information alone cannot help recognize a category that does not repeatably and reliably produce such features. [sent-58, score-0.323]

32 1 Metric object size from monocular metadata Absolute pixel size can be infered using a planar object approximation and depth from focus cues. [sent-60, score-1.459]

33 EXIF stores a wide range of intrinsic camera parameters, which often include the focus distance as an explicit parameter (in some cameras it is not provided directly, but can be estimated from other provided parameters). [sent-62, score-0.266]

34 This gives us a workable approximation of the depth of the object, assuming it is in focus in the scene: with a pinhole camera model, we can derive the metric size of a pixel sd in the scene given these assumptions. [sent-63, score-0.869]

35 Using simple trigonometry, the metric pixel size is ρ = f r , where s is the sensor width, d is the focus distance, f is the focal length, and r is the horizontal resolution of the sensor. [sent-64, score-0.376]

36 As shown in Figure 2, this method provides a size estimate reference for the visual observation based on images commonly available on the internet, e. [sent-65, score-0.275]

37 A bounding box can either be estimated from the feature locations, given an uncluttered background, or provided by manual labeling or by an object discovery technique which clusters local features to discover the segmentation of the training data. [sent-69, score-0.9]

38 2 Naive Bayes estimation of discriminative feature weights Our object model is based on a bag-of-words model where an object is encoded by a set of visual features xi ∈ X within the circumscribing bounding box. [sent-71, score-1.023]

39 We denote object appearance with p(X|C); following [1], this density can be captured and modeled using Parzen window density estimates: 3 Figure 3: Metric object size for ten different categories derived from camera metadata. [sent-73, score-0.896]

40 We compute the detection score for a given bounding box from the log-likelihood ratio computed based on the kernel density estimate from above. [sent-78, score-0.458]

41 The core idea is to structure the search space using a search tree. [sent-84, score-0.228]

42 Bounds can be easily computed for bag-of-words representations, which have been previously used in this context for object detection. [sent-89, score-0.237]

43 Each feature has a learned weight wj , wherefore the score function f reads: f (r) = wj , (5) j∈T (b) where T (b) is the set of all features contained in the bounding box b. [sent-90, score-0.525]

44 Our bounding box hypotheses b = (x1 , y1 , z1 , x2 , y2 , z2 ) are defined explicitly in 3d and indicate the actual spatial relation of objects in the scene. [sent-93, score-0.43]

45 We employ a constraint factor S(b) to the objective that indicates if a bounding box has a valid size given a particular class or not: f (r) = wj S(b) (6) j∈T (b) S(b) = 1 is a basic rectangle function that takes the value 1 for valid bounding boxes and 0 otherwise. [sent-94, score-0.772]

46 Most importantly, bounds over bounding box sets can still be efficiently computed. [sent-95, score-0.364]

47 As long as the bounding box set at a given node in the search tree contains at least one bounding box of valid size, the score is unaffected. [sent-96, score-0.881]

48 Given these measurements, we constrain the search to leverage the metric information acquired at training time. [sent-99, score-0.343]

49 The depth for each feature in the image at test time allows us to infer their 3D location in the test scene. [sent-100, score-0.591]

50 We can thus extend efficient multi-class branchand-bound search to operate in metric 3D space under the constraints imposed by our knowledge of metric patch size and metric object size. [sent-101, score-1.075]

51 We not only split bounding box sets along dimensions, but also split the set of object classes. [sent-103, score-0.601]

52 proposed to up/downsample an image at multiple scales and identify the characterstic scale for each image patch [9]. [sent-108, score-0.383]

53 A histogram of edge orientations is computed for each patch scaled to its characteristic scale in order to obtain a scale-invariant visual descriptor. [sent-109, score-0.298]

54 With both methods, a feature in one image can be mapped to the same characteristic scale a feature in another image. [sent-112, score-0.318]

55 Instead, it determines the metric size of any image patch and uses it to compare two features directly. [sent-115, score-0.553]

56 There have been several works on estimating depth from single images. [sent-116, score-0.291]

57 Some very early work estimated depth from the degree of the defocus of edges [8]. [sent-117, score-0.291]

58 [6] describes a method to infer scene depth from structure baesd on global and local histograms of Gabor filter responses for indoor and outdoor scenes. [sent-118, score-0.589]

59 [11] describes a supervised Markov Random Field method to predict the depth from local and global features for outdoor images. [sent-119, score-0.501]

60 Hardware-based methods for obtaining 3D information from monocular images include modifying the structure of a conventional camera to enable it to capture 3D geometry. [sent-121, score-0.486]

61 For example, [12] introduces the coded aperture technique by inserting a patterned occluder within the aperture of the camera lenses. [sent-122, score-0.391]

62 Images captured by such a camera exhibit depth-dependent patterns from which a layered depth map can be extracted. [sent-123, score-0.536]

63 Most methods based on visual feature quantization learn their codebooks using invariant features. [sent-124, score-0.245]

64 However, the scale of each code word is lost after each image patch is normalized to its invariant region. [sent-125, score-0.35]

65 For example, an eye of a dinosaur may be confused with an eye of a fish, because their size difference is lost once they are embedded into the visual code book. [sent-127, score-0.265]

66 For example, [13] records the relative position of the object center in the codebook, and at test time each codebook word votes for the possible object center at multiple scales. [sent-129, score-0.612]

67 Moreover, [14] explicitly put the orientation and scale of each feature in the codebook, so that object center location can be inferred directly. [sent-130, score-0.369]

68 However, these works treat orientation and scale as independent of the feature descriptor and use them to post-verify whether a feature found to be consistent in terms of the appearance desciptor would also be consistent in terms of scale. [sent-131, score-0.252]

69 A visual word would be matched only if its size is right. [sent-133, score-0.23]

70 In other words, the visual apperance and the scale are matched simulaneously in our codebook. [sent-134, score-0.239]

71 Depth information has been used to improve the performance of various image processing tasks, such as video retrieval, object instance detection, 3D scene recognition, and vehicle navigation. [sent-135, score-0.565]

72 For example, [15] used depth feature for video retrieval, extracting depth from monocular video sequences by exploiting the motion parallax of the objects in the video. [sent-136, score-1.14]

73 [16] developed an intergrated probablistic model for apperance and 3D geometry of object categories. [sent-137, score-0.297]

74 However, their method does not expliclty assign physical size to each image patch and needs to provide scale-invariance by explictly calculating the perspective projection of objects in different 3D poposes. [sent-138, score-0.415]

75 In contrast, our method can infer the real-world sizes of features and can establish feature correspondences at their true physical scale. [sent-139, score-0.253]

76 [17] proposed a way to use depth estimation for real-time obstacle detection from a monocular video stream in a vehicle navigation scenario. [sent-140, score-0.824]

77 Their method estimates scene depth from the scaling of supervised image regions and generates obstacle hypotheses from these depth estimates. [sent-141, score-0.841]

78 5 Experiments In the experiments we show how to improve performance of visual object classifiers by leveraging richer sensor modalities deployed at test time. [sent-142, score-0.546]

79 We analyze how the different proposed means of putting visual recognition in metric context improves detection performance. [sent-143, score-0.509]

80 1 Data For training we explore the camera-based metadata scheme described above, where we derive the metric pixel size from EXIF data. [sent-145, score-0.653]

81 We downloaded 38 images of 10 object categories taken with a consumer grade dSLR that stores relevant EXIF fields (e. [sent-146, score-0.346]

82 For test data we have collected 34 scenes in our laboratory of varying complexity containing 120 object instances in offices and and a kitchen. [sent-149, score-0.326]

83 Stereo depth observations using a calibrated camera rig are obtained with test imagery, providing an estimate of the 3-D depth of each feature point at test time. [sent-151, score-0.998]

84 7 object bike helmet body wash juice kleenex mug pasta phone pringles toothpaste vitamins average baseline 89. [sent-153, score-0.286]

85 59 Table 1: Average precision for several categories for baseline 2-D branch and bound search and our 2. [sent-176, score-0.318]

86 2 Evaluation We start with a baseline, which uses the plain branch and bound detection scheme and 2D features. [sent-179, score-0.226]

87 1D, adding 3D location to the interest points, as well as employing the metric size constraint. [sent-181, score-0.247]

88 Table 1 shows the average precision for each category for baseline 2-D branch and bound search and our 2. [sent-182, score-0.355]

89 Adding the metric object constraints (second column) improves the results significantly. [sent-184, score-0.416]

90 For the training data available for these categories the local evidence was apparently not strong enough to support this detection scheme, but with size constraints performance improved significantly. [sent-188, score-0.409]

91 6 Conclusion Progress on large scale systems for visual categorization has been driven by the abundance of training data available from the web. [sent-189, score-0.269]

92 3D measurements from stereo or lidar, typically found on contemporary robotic platforms, but there is rarely sufficient training data to learn robust models using these sensors. [sent-192, score-0.219]

93 In order to reconcile these two trends, we developed a method for appearance-based visual recognition in metric context, exploiting camera-based metadata to obtain size information regarding a category and local feature models that can be exploited using 3-D sensors at test time. [sent-193, score-1.205]

94 We augmented local feature-based visual models with a “2. [sent-195, score-0.215]

95 1D” object representation by introducing the notion of a metric patch size. [sent-196, score-0.535]

96 We presented a fast, multi-class detection scheme based on a metric branch-and-bound formulation. [sent-198, score-0.319]

97 While our method was demonstrated only on simple 2-D SURF features, we belive these methods will be applicable as well to multikernel schemes with additional feature modalities, as well as object level desriptors (e. [sent-199, score-0.371]

98 Wohn, Pyramid based depth from focus, In Proceedings of Computer Vision and Pattern Recognition, 1988. [sent-241, score-0.291]

99 Freeman, Image and depth from a convene tional camera with a coded aperture, ACM Transactions on Graphics, 2007. [sent-260, score-0.524]

100 Freisleben, Using depth features to retrieve monocular video shots, In Proceedings of the 6th ACM international conference on image and video retrieval, 2007. [sent-265, score-0.89]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('depth', 0.291), ('metadata', 0.269), ('monocular', 0.248), ('exif', 0.239), ('object', 0.237), ('bounding', 0.209), ('camera', 0.198), ('metric', 0.179), ('box', 0.155), ('visual', 0.127), ('patch', 0.119), ('search', 0.114), ('recognition', 0.109), ('category', 0.106), ('image', 0.106), ('sensors', 0.095), ('detection', 0.094), ('scene', 0.092), ('intrinsics', 0.09), ('surf', 0.09), ('stereo', 0.089), ('local', 0.088), ('sensor', 0.088), ('branch', 0.086), ('video', 0.082), ('features', 0.081), ('feature', 0.08), ('aperture', 0.079), ('descriptors', 0.076), ('absolute', 0.073), ('icsi', 0.072), ('sensing', 0.07), ('categories', 0.069), ('cameras', 0.068), ('darrell', 0.068), ('size', 0.068), ('objects', 0.066), ('codebook', 0.064), ('sift', 0.062), ('obstacle', 0.061), ('vision', 0.061), ('apperance', 0.06), ('gloh', 0.06), ('jpeg', 0.06), ('junsong', 0.06), ('lidar', 0.06), ('rig', 0.06), ('subvolume', 0.06), ('vip', 0.06), ('zicheng', 0.06), ('nc', 0.059), ('physical', 0.056), ('modalities', 0.055), ('schemes', 0.054), ('rectangle', 0.053), ('eecs', 0.053), ('keyboard', 0.052), ('situated', 0.052), ('discriminative', 0.052), ('scale', 0.052), ('uc', 0.05), ('training', 0.05), ('scenes', 0.05), ('baseline', 0.049), ('vehicle', 0.048), ('imagery', 0.048), ('captured', 0.047), ('pami', 0.047), ('scheme', 0.046), ('sources', 0.045), ('reconcile', 0.045), ('scans', 0.045), ('stored', 0.045), ('measurements', 0.043), ('lowe', 0.043), ('nearest', 0.042), ('shape', 0.041), ('pixel', 0.041), ('occlusion', 0.041), ('outdoor', 0.041), ('indoor', 0.041), ('appearance', 0.04), ('images', 0.04), ('available', 0.04), ('test', 0.039), ('valid', 0.039), ('ying', 0.039), ('graphics', 0.039), ('mobile', 0.039), ('progress', 0.039), ('invariant', 0.038), ('wu', 0.037), ('trevor', 0.037), ('robotic', 0.037), ('retrieval', 0.036), ('infer', 0.036), ('eye', 0.035), ('matters', 0.035), ('coded', 0.035), ('word', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

Author: Mario Fritz, Kate Saenko, Trevor Darrell

Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1

2 0.25362012 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman

Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1

3 0.2254048 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

4 0.21126458 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

5 0.17824736 149 nips-2010-Learning To Count Objects in Images

Author: Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1

6 0.17504241 133 nips-2010-Kernel Descriptors for Visual Recognition

7 0.15014175 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

8 0.14872916 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

9 0.14789587 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

10 0.1439812 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

11 0.1312404 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake

12 0.11811727 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

13 0.11399239 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

14 0.11155825 281 nips-2010-Using body-anchored priors for identifying actions in single images

15 0.11067326 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

16 0.10953461 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

17 0.10805805 81 nips-2010-Evaluating neuronal codes for inference using Fisher information

18 0.10055427 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

19 0.097764678 1 nips-2010-(RF)^2 -- Random Forest Random Field

20 0.094723649 235 nips-2010-Self-Paced Learning for Latent Variable Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.244), (1, 0.122), (2, -0.207), (3, -0.297), (4, 0.018), (5, -0.044), (6, -0.045), (7, 0.038), (8, 0.017), (9, 0.077), (10, 0.065), (11, 0.042), (12, -0.153), (13, 0.003), (14, 0.065), (15, 0.012), (16, 0.093), (17, -0.13), (18, 0.081), (19, 0.062), (20, 0.007), (21, -0.038), (22, -0.022), (23, 0.018), (24, 0.005), (25, -0.054), (26, 0.037), (27, 0.049), (28, 0.005), (29, -0.054), (30, -0.076), (31, 0.012), (32, -0.029), (33, -0.075), (34, 0.05), (35, -0.029), (36, -0.013), (37, 0.012), (38, 0.019), (39, -0.012), (40, -0.009), (41, -0.009), (42, 0.054), (43, 0.013), (44, 0.04), (45, -0.06), (46, 0.035), (47, 0.042), (48, -0.011), (49, -0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9658848 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

Author: Mario Fritz, Kate Saenko, Trevor Darrell

Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1

2 0.89408773 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

3 0.87868029 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei

Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1

4 0.79603022 149 nips-2010-Learning To Count Objects in Images

Author: Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1

5 0.79511088 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

6 0.73564827 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake

7 0.72609717 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

8 0.72051829 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

9 0.70379454 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

10 0.68432921 256 nips-2010-Structural epitome: a way to summarize one’s visual experience

11 0.65271217 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

12 0.64982802 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

13 0.62118477 17 nips-2010-A biologically plausible network for the computation of orientation dominance

14 0.61382264 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike

15 0.60648286 1 nips-2010-(RF)^2 -- Random Forest Random Field

16 0.57063973 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

17 0.5396679 281 nips-2010-Using body-anchored priors for identifying actions in single images

18 0.53200418 234 nips-2010-Segmentation as Maximum-Weight Independent Set

19 0.52687746 133 nips-2010-Kernel Descriptors for Visual Recognition

20 0.48525286 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(12, 0.071), (13, 0.023), (17, 0.018), (27, 0.065), (30, 0.049), (35, 0.038), (45, 0.267), (50, 0.218), (52, 0.011), (60, 0.047), (77, 0.022), (78, 0.02), (90, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96422416 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

2 0.95448768 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random fields (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive definite systems, this yields a very efficient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efficient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the first layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as finite or infinite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly benefit from the proposed methods. 1

3 0.95148259 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

4 0.94975394 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We first consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efficient mean field approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.

5 0.94764298 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

6 0.9432677 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

same-paper 7 0.92516243 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

8 0.89801919 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

9 0.88860297 54 nips-2010-Copula Processes

10 0.88654286 257 nips-2010-Structured Determinantal Point Processes

11 0.88284135 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

12 0.88175553 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

13 0.88007241 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

14 0.87858719 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

15 0.87594867 217 nips-2010-Probabilistic Multi-Task Feature Selection

16 0.87593544 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

17 0.87033355 158 nips-2010-Learning via Gaussian Herding

18 0.86738884 23 nips-2010-Active Instance Sampling via Matrix Partition

19 0.86536545 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

20 0.86514533 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades