nips nips2010 nips2010-1 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nadia Payet, Sinisa Todorovic
Abstract: We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.
Reference: text
sentIndex sentText sentNum sentScore
1 A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. [sent-7, score-0.316]
2 RF collects in leaf nodes of each decision tree the class histograms of training examples. [sent-10, score-0.421]
3 (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. [sent-13, score-0.269]
4 In our empirical evaluation, we use only the visual information provided by image regions (e. [sent-14, score-0.122]
5 We derive theoretical performance bounds of (RF)2 , and demonstrate its utility on a challenging task of conjoint object recognition and segmentation. [sent-19, score-0.171]
6 Identifying subimage ownership among occurrences of distinct object classes in an image is a fundamental, and one of the most actively pursued problem in computer vision, machine learning, and artificial intelligence [1–11]. [sent-20, score-0.203]
7 Thus, we derive a unified framework for combined object recognition and segmentation, as a graph-structured prediction of all random variables in a single, consistent model of the scene. [sent-26, score-0.143]
8 The graphical model we use is Conditional Random Field (CRF) [12]—one of the most popular models for structured inference over pixels [2, 3], patches [4, 5], or image regions [6–8], for object recognition and segmentation. [sent-27, score-0.305]
9 The jumps between the states are reversible, and governed by a proposal density q(Y (t) → Y (t+1) ). [sent-41, score-0.168]
10 The proposal is accepted if the acceptance rate, α, drawn from U (0, 1), satisfies (t+1) (t) (t+1) →Y |X) α< min{1, q(Y (t) →Y (t+1) ) p(Y (t) |X) }. [sent-42, score-0.151]
11 As can be seen, the entire inference process is regulated by ratios of the proposal and posterior distributions. [sent-44, score-0.324]
12 Our key idea is to directly estimate the ratios of the proposal and posterior distributions, instead of computing each individual distribution for conducting MH jumps. [sent-46, score-0.284]
13 We view the trees as a way of discriminatively structuring evidence about the class distributions in the training set. [sent-51, score-0.187]
14 In particular, each leaf of each tree in RF stores a histogram of the number of training examples from each class that reached that leaf. [sent-52, score-0.343]
15 When a new example is encountered, it is “dropped” down each of the trees in the forest, until it reaches a leaf in every tree. [sent-53, score-0.237]
16 The class histograms stored in all these leaves can then be used as a robust estimate of the ratio of that example’s posterior distributions. [sent-54, score-0.213]
17 This is related to recent work on Hough forests for object detection and localization [14], where leaves collect information on locations and sizes of bounding boxes of objects in training images. [sent-55, score-0.386]
18 However, they use this evidence to predict a spatial distribution of bounding boxes in a test image, whereas we use the evidence stored in tree leaves to predict the distribution ratios. [sent-56, score-0.321]
19 Evidence trees are also used in [15], but only as a first stage of a stacked-classifier architecture which replaces the standard majority voting of RF. [sent-57, score-0.15]
20 We are not aware of any theoretical analysis of RF as an estimator of ratios of posterior distributions. [sent-61, score-0.21]
21 Learning is efficiently conducted by RF which collects the class histograms of training examples in leaf nodes of each decision tree. [sent-63, score-0.373]
22 This evidence is then used for the non-parametric estimation of the ratios of the proposal and posterior distributions, required by MH-based inference of (RF)2 . [sent-64, score-0.374]
23 We derive the theoretical error bounds of estimating distribution ratios by a two-class RF, which is then used to derive the theoretical performance bounds of a two-class (RF)2 . [sent-65, score-0.168]
24 2 2 CRF Model We formulate multiclass object recognition and segmentation as the MAP inference of a CRF, defined over a set of multiscale image regions. [sent-68, score-0.352]
25 Regions are used as image features, because they are dimensionally matched with 2D object occurrences in the image, and thus facilitate modeling of various perceptual-organization and contextual cues (e. [sent-69, score-0.2]
26 Access to regions is provided by the state-ofthe-art, multiscale segmentation algorithm of [17], which detects and closes object (and object-part) boundaries using the domain knowledge. [sent-72, score-0.312]
27 Since the right scale at which objects occur is unknown, we use all regions from all scales. [sent-73, score-0.116]
28 The extracted regions are organized in a graph, G = (V, E), with V and E are sets of nodes and edges. [sent-74, score-0.149]
29 Each node i is characterized by a descriptor vector, xi , whose elements describe photometric and geometric properties of the corresponding region (e. [sent-79, score-0.144]
30 Since the segmentation algorithm of [17] is strictly hierarchical, region i is descendent of region j, if i is fully embedded as subregion within ancestor j. [sent-83, score-0.206]
31 Each edge (i, j) is associated with a tag, eij , indicating the relationship type between i and j. [sent-87, score-0.123]
32 Let pi = p(yi |xi ) and pij = p(yi , yj |xi , xj , eij ) denote the posterior distributions over nodes and pairs of nodes. [sent-93, score-0.416]
33 Then, we define CRF as (1) p(Y |G) = i∈V p(yi |xi ) (i,j)∈E p(yi , yj |xi , xj , eij ) = i∈V pi (i,j)∈E pij . [sent-94, score-0.249]
34 SWcut iterates the Metropolis-Hastings (MH) reversible jumps through the following two steps. [sent-98, score-0.119]
35 (1) Graph clustering: SW-cut probabilistically samples connected components, CC’s, where each CC represents a subset of nodes with the same color. [sent-99, score-0.133]
36 This is done by probabilistically cutting edges between all graph nodes that have the same color based on their posterior distributions pij = p(yi , yj |xi , xj , eij ). [sent-100, score-0.599]
37 (2) Graph relabeling: SW-cut randomly selects one of the CC’s obtained in step (1), and randomly flips the color of all nodes in that CC, and cuts their edges with the rest of the graph nodes having that same color. [sent-101, score-0.251]
38 If i and j have the same label, their edge is probabilistically sampled according to posterior distribution pij . [sent-111, score-0.217]
39 To separate the re-colored CC from the rest of the graph, we cut existing edges that connect CC to the rest of the graph nodes with that same color. [sent-115, score-0.192]
40 Let q(A → B) be the proposal probability for moving from state A to B, and let q(B → A) denote the converse. [sent-118, score-0.141]
41 Also, the ratio p(Y =B|G) p(Y =A|G) accounts only for the recolored nodes in CC – not the entire graph G, since all other probabilities have not changed from state A to state B. [sent-122, score-0.204]
42 (1), the ratios of the proposal and posterior distributions characterizing states A and B can be specified as B pB p(Y = B|G) pB q(B→A) (i,j)∈CutB (1−pij ) ij i , and = = · . [sent-124, score-0.354]
43 (3) q(A→B) p(Y = A|G) (1−pA ) pA pA ij ij (i,j)∈CutA i∈CC i j∈N (i) where CutA and CutB denote the sets of “cut” edges in states A and B, and N (i) is the set of neighbors of node i, N (i) = {j : j ∈ V, (i, j) ∈ E}. [sent-125, score-0.154]
44 4 Learning RF can be used for estimating the ratios of the proposal and posterior distributions, given by Eq. [sent-130, score-0.284]
45 If region i falls within the bounding box of an object in class y ∈ {1, 2, . [sent-135, score-0.204]
46 If i covers a number of bounding boxes of different classes then i is added to the training set multiple times to account for all distinct class labels it covers. [sent-139, score-0.185]
47 , M } is used to learn an ensemble of T decision trees representing RF. [sent-144, score-0.12]
48 In particular, each training sample is passed through every decision tree from the ensemble until it reaches a leaf node. [sent-145, score-0.296]
49 Each leaf l records a class histogram, Φl = {φl (y) : y = 1, . [sent-146, score-0.165]
50 , K; e = 1, 2, 3}, where ψll′ (y, y ′ , e) counts the number of pairs of training examples belonging to classes y and y ′ that reached leaves l and l′ , and also have the relationship type e – namely, ascendent/descendent, touching, or far relationship. [sent-154, score-0.159]
51 Given Φl and Ψll′ , we in a position to estimate the ratios of the proposal and posterior distributions, defined in (3), which control the Metropolis-Hastings jumps in the SW-cut. [sent-155, score-0.35]
52 Suppose two regions, A A B B represented by their descriptors xi and xj , are labeled as yi and yj in state A, and yi and yj in state B of one iteration of the SW-cut. [sent-156, score-0.288]
53 Also, after passing xi and xj through T decision trees of the t t learned RF, suppose they reached leaves li and lj in each tree t = 1, . [sent-157, score-0.263]
54 Then, we compute pB i = pA i T B t=1 φlt (yi ) i , T A t=1 φlt (yi ) i pB ij = pA ij T B B t=1 ψlt lt (yi , yj , eij ) i j , T A A t=1 ψlt lt (yi , yj , eij ) i j To estimate the ratio of the proposal distributions, q(B→A) q(A→B) , for estimating p(Y = B|G) . [sent-161, score-0.695]
55 Thus, we compute pij = T t=1 ψlt lt (yi , yj , eij ) i j T Φl t t=1 Φlt i j for estimating q(B→A) . [sent-163, score-0.325]
56 5 Results (RF)2 is evaluated on the task of object recognition and segmentation on two benchmark datasets. [sent-165, score-0.229]
57 Second, the Street-Scene dataset consists of 3547 images of urban environments, and has manually annotated regions [6, 19]. [sent-168, score-0.12]
58 Both datasets provide labels of bounding boxes around object occurrences as ground truth. [sent-170, score-0.255]
59 Images are segmented using the multiscale segmentation algorithm of [17], which uses the perceptual significance of a region boundary, Pb ∈ [0, 100], as an input parameter. [sent-171, score-0.189]
60 If a region covers a number of bounding boxes of different classes, it is added to the training set multiple times to account for each distinct label. [sent-176, score-0.181]
61 We use the standard random splits of training data to train 100 decision trees of RF, constructed in the top-down way. [sent-177, score-0.155]
62 The growth of each tree is constrained so its depth is less than 30, and its every leaf node contains at least 20 training examples. [sent-178, score-0.276]
63 To recognize and segment objects in a new test image, we first extract a hierarchy of regions from the image by the segmentation algorithm of [17]. [sent-179, score-0.242]
64 We examine the following three variants of (RF)2 : (RF)2 -1 — The spatial relationships of regions, eij , are not accounted for when computing pij in Eq. [sent-183, score-0.239]
65 (5); (RF)2 -2 — The region relationships touching and far are considered, while the ascendent/descendent relationship is not captured; and (RF)2 -3 — All three types of region layout and structural relationships are modeled. [sent-185, score-0.286]
66 Note that considering region layouts and structure changes only the class histograms recorded by leaf nodes of the learned decision trees, but it does not increase complexity. [sent-187, score-0.371]
67 This metric is suitable, because it does not favor object classes that occur in images more frequently. [sent-189, score-0.168]
68 The additional consideration of the region relationships touching and far increases performance relative to that of (RF)2 -1, as expected. [sent-195, score-0.156]
69 Note that the methods of [6,21] additionally use higher-level cues about the horizon location and 3D scene layout in their object recognition and segmentation. [sent-198, score-0.215]
70 Our segmentation results on example MSRC and Street-Scene images are shown in Fig. [sent-200, score-0.124]
71 Labels of the finest-scale regions are depicted using distinct colors, since pixels get labels of the finest-scale regions. [sent-202, score-0.117]
72 (RF)2 yields the best performance for all object classes except one. [sent-206, score-0.13]
73 5 Figure 1: Our object recognition and segmentation results on example images from the MSRC dataset (top two rows), and the Street-Scene dataset (bottom two rows). [sent-207, score-0.267]
74 The figure depicts boundaries of the finest-scale regions found by the multiscale algorithm of [17], and the color-coded labels of these regions inferred by (RF)2 . [sent-208, score-0.242]
75 An MH jump between states A and B is controlled by the acceptance rate α(A→B) which depends on 6 the ratios of the proposal and posterior distributions, q(B→A)p(Y =B|G) . [sent-251, score-0.358]
76 1 An Upper Error Bound of (RF)2 An error occurs along MH jumps when a balanced reversible jump is encountered, i. [sent-255, score-0.144]
77 , when there q(B→A) is no preference between jumping from state A to state B and reverse, q(A→B) =1, and RF wrongly predicts that the posterior distribution of state B is larger than that of A, p(Y =B|G) ≥1. [sent-257, score-0.187]
78 We consider a binary classification problem, for simplicity, where training and test instances may have positive and negative labels. [sent-281, score-0.121]
79 Instead, we assume that the learned decision trees have the following properties. [sent-286, score-0.12]
80 Each leaf node of a decision tree: (i) stores a total of C training instances that reach the leaf; and (ii) has a probabilistic margin γ ∈ [0, 1/2). [sent-287, score-0.362]
81 By margin, we mean that in every leaf reached by C training instances a fraction of 1/2 + γ of the training instances will belong to one class (e. [sent-288, score-0.394]
82 We say that a leaf is positive if a majority of the training instances collected by the leaf is positive, or otherwise, we say that the leaf is negative. [sent-293, score-0.59]
83 It is straightforward to show that when a positive instance is dropped through one of the decision trees in RF, it will 7 reach a positive leaf with probability 1/2 + γ, and a negative leaf with probability 1/2 − γ [15]. [sent-294, score-0.53]
84 A new test instance is classified by dropping it through T decision trees, and taking a majority vote of the labels of all C · T training instances stored in the leaves reached by the test instance. [sent-296, score-0.366]
85 We refer to this classification procedure as evidence voting [15], as opposed to decision voting over the leaf labels in the standard RF [13]. [sent-297, score-0.454]
86 The following proposition states that the probability that evidence voting misclassifies an instance, P (ǫ1 ), is upper bounded. [sent-298, score-0.164]
87 The probability that RF with T trees, where every leaf stores C training instances, incorrectly classifies an instance is upper bounded, P (ǫ1 )≤ exp(−8CT γ 4 ). [sent-300, score-0.253]
88 Evidence voting for labeling an instance can be formalized as drawing a total of C·T independent Bernoulli random variables, with the success rate p1 , whose outcomes are {−1, +1}, where +1 is received for correct, and −1 for incorrect labeling of the instance. [sent-302, score-0.209]
89 Evidence voting is also used for labeling pairs of instances. [sent-311, score-0.117]
90 The probability that evidence voting misclassifies a pair of test instances, P (ǫ2 ), is upper bounded, as stated in Proposition 2. [sent-312, score-0.128]
91 Evidence voting for labeling a pair of instances can be formalized as drawing a total of C 2 T independent Bernoulli random variables, with success rate p2 , whose outcomes are {−1, +1}, where +1 is received for correct, and −1 for incorrect labeling of the instance pair. [sent-316, score-0.269]
92 From Proposition 1, it follows that the probability that RF makes a wrong prediction about the posterior ratio of an instance is upper bounded, P (Zi ≥ 1) = P (ǫ1 ) = exp(−8CT γ 4), ∀i ∈ CC. [sent-320, score-0.156]
93 In addition, From Proposition 2, it follows that the probability that RF makes a wrong prediction about the posterior ratio of a pair of instances is upper bounded, P (Wij ≥ 1) = P (ǫ2 ) = exp(−8C 2 T π 4 γ 8 ), ∀i ∈ CC and j ∈ N (i). [sent-322, score-0.189]
94 Our key idea is to employ RF to directly compute the ratios of the proposal and posterior distributions of states visited along the Metropolis-Hastings reversible jumps, instead of estimating each individual distribution, and thus improve the convergence rate and accuracy of the CRF inference. [sent-330, score-0.367]
95 Such a non-parametric formulation of CRF and its inference has been demonstrated to outperform, in terms of computation time and accuracy, existing parametric CRF models on the task of multiclass object recognition and segmentation. [sent-331, score-0.183]
96 Fei-Fei, “Towards total scene understanding: Classification, annotation and segmentation in an automatic framework,” in CVPR, 2009. [sent-337, score-0.123]
97 Criminisi, “Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation,” in ECCV, 2006, pp. [sent-349, score-0.143]
98 Koller, “Region-based segmentation and object detection,” in NIPS, 2009. [sent-363, score-0.187]
99 Lempitsky, “Class-specific hough forests for object detection,” in CVPR, 2009. [sent-408, score-0.2]
100 Lanckriet, “Multi-class object localization by combining local contextual interactions,” in CVPR, 2010. [sent-446, score-0.127]
wordName wordTfidf (topN-words)
[('rf', 0.744), ('crf', 0.239), ('cc', 0.182), ('leaf', 0.165), ('msrc', 0.154), ('eij', 0.123), ('ratios', 0.112), ('proposal', 0.102), ('mh', 0.101), ('object', 0.101), ('pb', 0.097), ('segmentation', 0.086), ('regions', 0.082), ('pij', 0.081), ('voting', 0.078), ('lt', 0.076), ('forests', 0.074), ('trees', 0.072), ('posterior', 0.07), ('nodes', 0.067), ('probabilistically', 0.066), ('jumps', 0.066), ('touching', 0.061), ('wij', 0.06), ('instances', 0.06), ('region', 0.06), ('leaves', 0.056), ('fz', 0.054), ('todorovic', 0.054), ('forest', 0.053), ('reversible', 0.053), ('evidence', 0.05), ('acceptance', 0.049), ('decision', 0.048), ('tree', 0.048), ('edges', 0.046), ('yc', 0.046), ('yi', 0.046), ('cuta', 0.046), ('cutb', 0.046), ('payet', 0.046), ('cut', 0.045), ('yj', 0.045), ('fw', 0.044), ('multiscale', 0.043), ('boxes', 0.043), ('bounding', 0.043), ('recognition', 0.042), ('image', 0.04), ('ij', 0.04), ('inference', 0.04), ('cvpr', 0.039), ('labeling', 0.039), ('state', 0.039), ('reached', 0.039), ('pa', 0.039), ('images', 0.038), ('color', 0.037), ('scene', 0.037), ('proposition', 0.036), ('layout', 0.035), ('labels', 0.035), ('training', 0.035), ('dropping', 0.035), ('relationships', 0.035), ('wrong', 0.034), ('objects', 0.034), ('graph', 0.034), ('zi', 0.033), ('occurrences', 0.033), ('descriptor', 0.031), ('stored', 0.031), ('bernoulli', 0.031), ('histograms', 0.031), ('bessel', 0.031), ('fwij', 0.031), ('fzi', 0.031), ('histogram', 0.03), ('exp', 0.03), ('distributions', 0.03), ('classes', 0.029), ('theoretical', 0.028), ('node', 0.028), ('labeled', 0.028), ('ll', 0.028), ('dropped', 0.027), ('instance', 0.027), ('arbelaez', 0.027), ('collects', 0.027), ('sinisa', 0.027), ('contextual', 0.026), ('stores', 0.026), ('negative', 0.026), ('success', 0.026), ('ratio', 0.025), ('iccv', 0.025), ('jump', 0.025), ('galleguillos', 0.025), ('hough', 0.025), ('photometric', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 1 nips-2010-(RF)^2 -- Random Forest Random Field
Author: Nadia Payet, Sinisa Todorovic
Abstract: We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.
2 0.1548252 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
3 0.12236846 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
4 0.097764678 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
5 0.097015664 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
Author: Samy Bengio, Jason Weston, David Grangier
Abstract: Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1
6 0.096500367 69 nips-2010-Efficient Minimization of Decomposable Submodular Functions
7 0.094850317 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
8 0.092901982 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
9 0.091262273 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
10 0.087020136 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
11 0.085586965 149 nips-2010-Learning To Count Objects in Images
12 0.084565505 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures
13 0.076527826 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
14 0.076330811 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
15 0.074687265 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
16 0.073182032 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
17 0.072686948 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
18 0.071476266 234 nips-2010-Segmentation as Maximum-Weight Independent Set
19 0.07143303 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
20 0.071230918 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
topicId topicWeight
[(0, 0.191), (1, 0.08), (2, -0.058), (3, -0.116), (4, -0.072), (5, -0.034), (6, -0.083), (7, -0.005), (8, 0.11), (9, 0.024), (10, -0.081), (11, 0.01), (12, -0.054), (13, 0.063), (14, -0.022), (15, -0.041), (16, -0.002), (17, -0.122), (18, 0.024), (19, 0.049), (20, -0.003), (21, 0.008), (22, 0.089), (23, -0.028), (24, 0.008), (25, 0.084), (26, -0.016), (27, -0.016), (28, -0.084), (29, -0.004), (30, -0.034), (31, -0.004), (32, -0.15), (33, -0.125), (34, -0.02), (35, -0.013), (36, 0.038), (37, 0.013), (38, -0.064), (39, -0.072), (40, 0.075), (41, 0.048), (42, -0.025), (43, -0.04), (44, -0.058), (45, 0.046), (46, -0.004), (47, 0.013), (48, -0.095), (49, -0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.92759705 1 nips-2010-(RF)^2 -- Random Forest Random Field
Author: Nadia Payet, Sinisa Todorovic
Abstract: We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.
2 0.60113603 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
3 0.55771661 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
4 0.53462893 234 nips-2010-Segmentation as Maximum-Weight Independent Set
Author: William Brendel, Sinisa Todorovic
Abstract: Given an ensemble of distinct, low-level segmentations of an image, our goal is to identify visually “meaningful” segments in the ensemble. Knowledge about any specific objects and surfaces present in the image is not available. The selection of image regions occupied by objects is formalized as the maximum-weight independent set (MWIS) problem. MWIS is the heaviest subset of mutually non-adjacent nodes of an attributed graph. We construct such a graph from all segments in the ensemble. Then, MWIS selects maximally distinctive segments that together partition the image. A new MWIS algorithm is presented. The algorithm seeks a solution directly in the discrete domain, instead of relaxing MWIS to a continuous problem, as common in previous work. It iteratively finds a candidate discrete solution of the Taylor series expansion of the original MWIS objective function around the previous solution. The algorithm is shown to converge to an optimum. Our empirical evaluation on the benchmark Berkeley segmentation dataset shows that the new algorithm eliminates the need for hand-picking optimal input parameters of the state-of-the-art segmenters, and outperforms their best, manually optimized results.
5 0.52640831 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
6 0.50590563 149 nips-2010-Learning To Count Objects in Images
7 0.49432665 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
8 0.49038947 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures
9 0.48612586 144 nips-2010-Learning Efficient Markov Networks
10 0.48196831 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
11 0.46973175 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
12 0.46917069 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation
13 0.46215388 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
14 0.45997697 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection
15 0.45799017 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
16 0.45675367 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
17 0.45076773 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
18 0.44583249 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
19 0.44465068 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
20 0.4367322 2 nips-2010-A Bayesian Approach to Concept Drift
topicId topicWeight
[(13, 0.027), (16, 0.167), (17, 0.027), (27, 0.095), (30, 0.061), (35, 0.015), (45, 0.242), (50, 0.053), (52, 0.033), (60, 0.036), (72, 0.029), (77, 0.046), (78, 0.031), (90, 0.043)]
simIndex simValue paperId paperTitle
1 0.91358644 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions
Author: Silvia Chiappa, Jan R. Peters
Abstract: Many time-series such as human movement data consist of a sequence of basic actions, e.g., forehands and backhands in tennis. Automatically extracting and characterizing such actions is an important problem for a variety of different applications. In this paper, we present a probabilistic segmentation approach in which an observed time-series is modeled as a concatenation of segments corresponding to different basic actions. Each segment is generated through a noisy transformation of one of a few hidden trajectories representing different types of movement, with possible time re-scaling. We analyze three different approximation methods for dealing with model intractability, and demonstrate how the proposed approach can successfully segment table tennis movements recorded using a robot arm as haptic input device. 1
2 0.90434921 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan
Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1
same-paper 3 0.90408981 1 nips-2010-(RF)^2 -- Random Forest Random Field
Author: Nadia Payet, Sinisa Todorovic
Abstract: We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.
4 0.89988285 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa
Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semidefinite program (LRSDP) with the advantage that: 1) an efficient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm finds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the affine SfM problem, non-rigid SfM and photometric stereo problems.
5 0.87497175 212 nips-2010-Predictive State Temporal Difference Learning
Author: Byron Boots, Geoffrey J. Gordon
Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem. 1
6 0.84557098 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
7 0.84222305 155 nips-2010-Learning the context of a category
8 0.83917058 268 nips-2010-The Neural Costs of Optimal Control
9 0.83892465 282 nips-2010-Variable margin losses for classifier design
10 0.83796775 17 nips-2010-A biologically plausible network for the computation of orientation dominance
11 0.83731371 63 nips-2010-Distributed Dual Averaging In Networks
12 0.83690906 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
13 0.83644038 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
14 0.83571273 158 nips-2010-Learning via Gaussian Herding
16 0.83534473 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
17 0.83467472 98 nips-2010-Functional form of motion priors in human motion perception
18 0.8341369 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
19 0.83361638 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
20 0.83357209 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models