nips nips2004 nips2004-47 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antonio Torralba, Kevin P. Murphy, William T. Freeman
Abstract: We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Contextual models for object detection using boosted random fields Antonio Torralba MIT, CSAIL Cambridge, MA 02139 torralba@mit. [sent-1, score-0.22]
2 edu Abstract We seek to both detect and segment objects in images. [sent-7, score-0.144]
3 To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). [sent-8, score-0.431]
4 The graph structure is learned by assembling graph fragments in an additive model. [sent-9, score-0.231]
5 The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. [sent-10, score-0.104]
6 We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. [sent-11, score-0.287]
7 We apply our system to detect stuff and things in office and street scenes. [sent-12, score-0.252]
8 1 Introduction Our long-term goal is to build a vision system that can examine an image and describe what objects are in it, and where. [sent-13, score-0.192]
9 5(a), objects of interest, such as the keyboard or mouse, are so small that they are impossible to detect just by using local features. [sent-15, score-0.377]
10 Murphy et al [9] used global scene context to help object recognition, but did not model relationships between objects. [sent-18, score-0.093]
11 Fink and Perona [4] exploited local dependencies in a boosting framework, but did not allow for multiple rounds of communication between correlated objects. [sent-19, score-0.361]
12 He et al [6] do not model connections between objects directly, but rather they induce such correlations indirectly, via a bank of hidden variables, using a “restricted Boltzmann machine” architecture. [sent-20, score-0.144]
13 In this paper, we exploit contextual correlations between the object classes by introducing Boosted Random Fields (BRFs). [sent-21, score-0.217]
14 Boosted random fields build on both boosting [5, 10] and conditional random fields (CRFs) [8, 7, 6]. [sent-22, score-0.299]
15 Boosting is a simple way of sequentially constructing “strong” classifiers from “weak” components, and has been used for singleclass object detection with great success [12]. [sent-23, score-0.169]
16 Dietterich et al [3] combine boosting and 1D CRFs, but they only consider the problem of learning the local evidence potentials; we consider the much harder problem of learning the structure of a 2D CRF. [sent-24, score-0.374]
17 We propose a method for learning densely connected random fields with long range connections. [sent-27, score-0.101]
18 The topology of these connections is chosen by a weak learner which has access to a library of graph fragments, derived from patches of labeled training images, which reflect typical spatial arrangments of objects (similar to the segmentation fragments in [2]). [sent-28, score-0.599]
19 At each round of the learning algorithm, we add more connections from other locations in the image and from other classes (detectors). [sent-29, score-0.22]
20 The connections are assumed to be spatially invariant, which means this update can be performed using convolution followed by a sigmoid nonlinearity. [sent-30, score-0.101]
21 In addition to recognizing things, such as cars and people, we are also interested in recognizing spatially extended “stuff” [1], such as roads and buildings. [sent-32, score-0.129]
22 The traditional sliding window approach to object detection does not work well for detecting “stuff”. [sent-33, score-0.2]
23 Instead, we combine object detection and image segmentation (c. [sent-34, score-0.306]
24 We do not rely on a bottom-up image segmentation algorithm, which can be fragile without top-down guidance. [sent-37, score-0.137]
25 2 Learning potentials and graph structure A conditional random field (CRF) is a distribution of the form 1 φi (Si ) ψi,j (Si , Sj ) P (S|x) = Z i j∈Ni where x is the input (e. [sent-38, score-0.182]
26 Our goal is to learn the local evidence potentials, φi , the compatibility potentials ψ, and the set of neighbors Ni . [sent-42, score-0.296]
27 We propose the following simple approximation: use belief propagation (BP) to estimate the marginals, P (Si |x), and then use boosting to maximize the likelihood of each node’s training data with respect to φi and ψ. [sent-43, score-0.331]
28 If we assume that the local potentials t t have the form φt (si ) = [eFi /2 ; e−Fi /2 ], where Fit is some function of the input data, then: i bt (+1) = σ(Fit + Gt ), Gt = log Mit (+1) − log Mit (−1) (3) i i i −u where σ(u) = 1/(1 + e ) is the sigmoid function. [sent-51, score-0.538]
29 Input: a set of labeled pairs {xi,m ; Si,m }, bound T t Output: Local evidence functions fit (x) and message update functions gi (bNi ). [sent-55, score-0.45]
30 (a) Fit local potential fi (xi,m ) by weighted LS to t t t Yi,m = Si,m (1 + e−Si,m (Fi +Gi,m ) ) t t . [sent-61, score-0.113]
31 (b) Fit compatibilities gi (bt−1 ) to Yi,m by weighted LS. [sent-62, score-0.193]
32 Ni ,m t−1 t (c) Compute local potential Fi,m = Fi,m + fit (xi,m ) t g n (bt−1 ) Ni ,m n=1 i t the beliefs bt = σ(Fi,m + Gt ) i,m i,m t+1 weights wi,m = bt (−1) bt (+1) i,m i,m (d) Compute compatibilities Gt = i,m (e) Update (f) Update Figure 1: BRF training algorithm. [sent-63, score-1.681]
33 We assume that the graph is very densely connected so that the information that one single node sends to another is so small that we can make the approximation µt+1 (+1)/µt+1 (−1) ≃ 1. [sent-64, score-0.243]
34 (This is a reasonable approximation in the case of images, k→i k→i where each node represents a single pixel; only when the influence of many pixels is taken into account will the messages become informative. [sent-65, score-0.165]
35 Therefore, We can write the beliefs at iteration t as a function of the local evidences and the beliefs at time t − 1: bt (+1) = σ(Fit (xi,m ) + Gt (bt−1 )). [sent-67, score-0.85]
36 The key idea i i m behind BRFs is to use boosting to learn the G functions, which approximately implement message passing in densely connected graphs. [sent-68, score-0.435]
37 1 Learning local evidence potentials Defining Fit (xi,m ) = Fit−1 (xi,m ) + fit (xi,m ) as an additive model, where xi,m are the features of training sample m at node i, we can learn this function in a stagewise fashion by optimizing the second order Taylor expansion of Eq. [sent-71, score-0.539]
38 4 wrt fit , as in logitBoost [5]: arg min log Jit ≃ arg min t t fi fi t t t wi,m (Yi,m − fit (xi,m ))2 (7) m t t where Yi,m = Si,m (1+e−Si,m (Fi +Gi,m ) ). [sent-72, score-0.494]
39 In the case that the weak learner is a “regression stump”, fi (x) = ah(x)+b, we can find the optimal a, b by solving a weighted least squares t problem, with weights wi,m = bt (−1) bt (+1); we can find the best basis function h(x) by i i searching over all elements of a dictionary. [sent-73, score-1.002]
40 2 Learning compatibility potentials and graph structure In this section, we discuss how to learn the compatibility functions ψij , and hence the structure of the graph. [sent-75, score-0.35]
41 Instead of learning the compatibility functions ψij , we propose to t 1. [sent-76, score-0.097]
42 Input: a set of inputs {xi,m } and functions fit , gi Output: Set of beliefs bi,m and MAP estimates Si,m . [sent-77, score-0.46]
43 From t = 1 to T , repeat t−1 t (a) Update local evidences Fi,m = Fi,m + fit (xi,m ) (b) Update compatibilities Gt = i,m (c) Compute current beliefs bt i,m = t g n (bt−1 ) Ni ,m n=1 i t σ(Fi,m + Gt ) i,m 4. [sent-80, score-0.906]
44 We propose to use an additive model for Gt+1 as we i i t n did for learning F : Gt+1 = n=1 gi (bt ), where bt is a vector with the beliefs of all m m i,m n nodes in the graph at iteration t for the training sample m. [sent-84, score-0.859]
45 The weak learners gi (bt ) can m n be regression stumps with the form gi (bt ) = aδ(w · bt > θ) + b, where a, b, θ are the m m parameters of the regression stump, and wi is a set of weights selected from a dictionary. [sent-85, score-0.77]
46 The weak learners gi (bt ) will also be linear m t functions. [sent-87, score-0.271]
47 Hence the belief update simplifies to bt+1 (+1) = σ(αi · bt + βi + Fi,m ), which m i,m is similar to the mean-field update equations. [sent-88, score-0.505]
48 The neighborhood Ni over which we sum incoming messages is determined by the graph structure, which is encoded in the non-zero n values of αi . [sent-89, score-0.166]
49 Each weak learner gi will compute a weighted combination of the beliefs of the some subset of the nodes; this subset may change from iteration to iteration, and can be t quite large. [sent-90, score-0.484]
50 At iteration t, we choose the weak learner gi so as to minimize t t t−1 log 1 + e−Si,m (Fi,m +gi (bm log Jit (bt−1 ) = − )+ t−1 n=1 n gi (bt−1 )) m m which reduces to a weighted least squares problem similar to Eq. [sent-91, score-0.458]
51 3 BRFs for multiclass object detection and segmentation With the BRF training algorithm in hand, we describe our approach for multiclass object detection and region-labeling using densely connected BRFs. [sent-96, score-0.599]
52 1 Weak learners for detecting stuff and things The square sliding window approach does not provide a natural way of working with irregular objects. [sent-98, score-0.256]
53 Using region labeling as an image representation allows dealing with irregular and extended objects (buildings, bookshelf, road, . [sent-99, score-0.252]
54 Extended stuff [1] may be a very important source of contextual information for other objects. [sent-103, score-0.223]
55 (a) Examples from the dictionary of about 2000 patches and masks, Ux,y , Vx,y . [sent-104, score-0.129]
56 Figure 3: Examples of patches from the dictionary and an example of the segmentation obtained using boosting trained with patches from (a). [sent-110, score-0.524]
57 The weak learners we use for the local evidence potentials are based on the segmentation fragments proposed in [2]. [sent-111, score-0.509]
58 Specifically, we create a dictionary of about 2000 image patches U , chosen at random (but overlapping each object), plus a corresponding set of binary (inclass/ out-of-class) image masks, V : see Fig. [sent-112, score-0.271]
59 At each round t, for each class c, and for each dictionary entry, we construct the following weak learner, whose output is a binary matrix of the same size as the image I: v(I) = ((I ⊗ U ) > θ) ∗ V > 0 (9) where ⊗ represents normalized cross-correlation and ∗ represents convolution. [sent-114, score-0.367]
60 The intuition behind this is that I ⊗ U will produce peaks at image locations that contain this patch/template, and then convolving with V will superimpose the segmentation mask on top of the peaks. [sent-115, score-0.206]
61 To be able to detect objects at multiple scales, we first downsample the image to scale σ, compute v(I ↓ σ), and then upsample the result. [sent-117, score-0.215]
62 The final weak learner does this for multiple scales, ORs all the results together, and then takes a linear transformation. [sent-118, score-0.162]
63 3(c) shows an example of segmentation obtained by using boosting without context. [sent-120, score-0.339]
64 The weak learners we use for the compatibility functions have a similar form: C gc (b) = α bc′ ∗ Wc′ +β (11) c′ =1 where bc′ is the image formed by the beliefs at all pixels for class c′ . [sent-121, score-0.541]
65 The binary kernels (graph fragments) W define, for each node x, y of object class c, all the nodes from which it will receive messages. [sent-124, score-0.172]
66 These kernels are chosen by sampling patches of various sizes from the labeling of images from the training set. [sent-125, score-0.178]
67 This allows generating complicated patterns of connectivity that reflect the statistics of object co-occurrences in the training set. [sent-126, score-0.123]
68 The overall incoming message is given by adding the kernels obtained at each boosting round. [sent-127, score-0.375]
69 (This is the key difference from mutual boosting [4], where the incoming message is just the output of a single weak learner; thus, in mutual boosting, previously learned inter-class connections are only used once. [sent-128, score-0.563]
70 ) Although it would seem to take O(t) time to compute Gt , we can precompute a single equivalent kernel W ′ , so at runtime the overall complexity is still linear in the number of boosting rounds, O(T ). [sent-129, score-0.301]
71 C Gt x,y,c = t n αn Wc′ b c′ ∗ c′ =1 n=1 def C βn = + n ′ b c′ ∗ W c′ + β ′ c′ =1 car car building car road car F Road b=σ(F+G) Car x car building building building road building Building car road building road a) Incoming messages to a car node. [sent-130, score-1.92]
72 t=1 t=2 t=4 G road road y t=20 c) A car out of context (outside 3rd floor windows) is less of a car. [sent-132, score-0.413]
73 t=40 Final labeling b(car) S(all) d) Evolution of the beliefs for the car nodes (b) and labeling (S) for road, building, car. [sent-133, score-0.462]
74 The BRF is trained to detect cars, buildings and the road. [sent-135, score-0.119]
75 4(a-b), we show the structures of the graph and the weights W ′ defined by GT for a BRF trained to detect cars, buildings and roads in street scenes. [sent-137, score-0.263]
76 2 Learning and inference For training we used a labeled dataset of office and street scenes with about 100 images in each set. [sent-139, score-0.146]
77 During the training, in the first 5 rounds we only update the local potentials, to allow local evidence to accrue. [sent-140, score-0.233]
78 After the 5th iteration we start updating also the compatibility functions. [sent-141, score-0.144]
79 At each round, we update only the local potential and compatibility function associated with a single object class that reduces the most the multiclass cost. [sent-142, score-0.346]
80 This allows objects that need many features to have more complicated local potentials. [sent-143, score-0.138]
81 The easy-to-detect objects can then pass information to the harder ones. [sent-145, score-0.087]
82 A similar behavior is obtained for the car detector (Fig. [sent-149, score-0.137]
83 The detection of building and road provides strong constraints for the locations of the car. [sent-151, score-0.271]
84 3 Cascade of classifiers with BRFs The BRF can be turned into a cascade [12] by thresholding the beliefs. [sent-153, score-0.094]
85 At each round we update a t binary rejection mask for each object class, Rx,y,c , by thresholding the beliefs at round t: t t−1 t t Rx,y,c = Rx,y,c δ(bx,y,c > θc ). [sent-155, score-0.615]
86 A pixel in the rejection mask is set to zero when we can t decide that the object is not present (when bt x,y,c is below the threshold θc ≃ 0), and it is set t to 1 when more processing is required. [sent-156, score-0.613]
87 Similarity we can define a detection mask that will indicate pixels in which we decide the object is present. [sent-158, score-0.285]
88 The mask is then used for computing the features v(I) and messages G by applying the convolutions only on the pixels not yet classified. [sent-159, score-0.214]
89 In this desk scene, it is easy to identify objects like the screen, keyboard and mouse, even though the local information is sometimes insufficient. [sent-163, score-0.32]
90 Middle: the evolution of the beliefs (b and F and G) during detection for a test image. [sent-164, score-0.27]
91 The graph bellow shows the average evolution of the area under the ROC for the three objects on 120 test images. [sent-166, score-0.174]
92 6 we compare the reduction of the search space when implementing a cascade using independent boosting (which reduces to Viola and Jones [12]), and when using BRF’s. [sent-169, score-0.394]
93 We see that for objects for which context is the main source of information, like the mouse, the reduction in search space is much more dramatic using BRFs than using boosting alone. [sent-170, score-0.389]
94 4 Conclusion The proposed BRF algorithm combines boosting and CRF’s, providing an algorithm that is easy for both training and inference. [sent-171, score-0.303]
95 We have demonstrated object detection in cluttered scenes by exploiting contextual relationships between objects. [sent-172, score-0.328]
96 The BRF algorithm is computationally efficient and provides a natural extension of the cascade of classifiers by integrating evidence from other objects in order to quickly reject certain image regions. [sent-173, score-0.271]
97 The BRF’s densely connected graphs, which efficiently collect information over large image regions, provide an alternative framework to nearest-neighbor grids for vision problems. [sent-174, score-0.206]
98 5 False alarm rate 1 Figure 6: Contextual information reduces the search space in the framework of a cascade and improves performances. [sent-183, score-0.171]
99 Discriminative random fields: A discriminative framework for contextual interaction in classification. [sent-229, score-0.124]
100 Using the forest to see the trees: a graphical model relating features, objects and scenes. [sent-244, score-0.087]
wordName wordTfidf (topN-words)
[('bt', 0.389), ('brf', 0.374), ('boosting', 0.273), ('gt', 0.264), ('mouse', 0.198), ('sk', 0.189), ('fit', 0.185), ('keyboard', 0.182), ('beliefs', 0.165), ('road', 0.138), ('car', 0.137), ('brfs', 0.125), ('screen', 0.124), ('contextual', 0.124), ('gi', 0.11), ('weak', 0.104), ('stuff', 0.099), ('potentials', 0.098), ('compatibility', 0.097), ('object', 0.093), ('round', 0.092), ('ni', 0.088), ('objects', 0.087), ('compatibilities', 0.083), ('jit', 0.083), ('fragments', 0.083), ('detection', 0.076), ('densely', 0.073), ('dictionary', 0.073), ('image', 0.071), ('mask', 0.069), ('messages', 0.067), ('labeling', 0.066), ('segmentation', 0.066), ('cascade', 0.063), ('fi', 0.062), ('buildings', 0.062), ('message', 0.061), ('graph', 0.058), ('learner', 0.058), ('elds', 0.057), ('learners', 0.057), ('building', 0.057), ('detect', 0.057), ('connections', 0.057), ('patches', 0.056), ('street', 0.055), ('local', 0.051), ('boosted', 0.051), ('node', 0.051), ('evidence', 0.05), ('alarm', 0.05), ('torralba', 0.05), ('pixels', 0.047), ('iteration', 0.047), ('si', 0.046), ('cars', 0.046), ('update', 0.044), ('stagewise', 0.042), ('things', 0.041), ('crfs', 0.041), ('crf', 0.041), ('murphy', 0.041), ('incoming', 0.041), ('bc', 0.04), ('rounds', 0.037), ('fink', 0.036), ('wc', 0.036), ('ce', 0.035), ('scenes', 0.035), ('vision', 0.034), ('stump', 0.033), ('evidences', 0.033), ('sends', 0.033), ('pixel', 0.033), ('multiclass', 0.032), ('additive', 0.032), ('thresholding', 0.031), ('sliding', 0.031), ('roads', 0.031), ('convolutions', 0.031), ('csail', 0.031), ('training', 0.03), ('xm', 0.03), ('evolution', 0.029), ('rejection', 0.029), ('masks', 0.029), ('search', 0.029), ('reduces', 0.029), ('connected', 0.028), ('graphs', 0.028), ('belief', 0.028), ('nodes', 0.028), ('runtime', 0.028), ('seeing', 0.028), ('irregular', 0.028), ('output', 0.027), ('recognizing', 0.026), ('conditional', 0.026), ('images', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
Author: Antonio Torralba, Kevin P. Murphy, William T. Freeman
Abstract: We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes. 1
2 0.15837646 19 nips-2004-An Application of Boosting to Graph Classification
Author: Taku Kudo, Eisaku Maeda, Yuji Matsumoto
Abstract: This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency. 1
3 0.15343969 44 nips-2004-Conditional Random Fields for Object Recognition
Author: Ariadna Quattoni, Michael Collins, Trevor Darrell
Abstract: We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as flexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a unified framework for part-based object recognition. The parameters of the CRF are estimated in a maximum likelihood framework and recognition proceeds by finding the most likely class under our model. The main advantage of the proposed CRF framework is that it allows us to relax the assumption of conditional independence of the observed data (i.e. local features) often used in generative approaches, an assumption that might be too restrictive for a considerable number of object classes.
Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse
Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1
5 0.11402566 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer
Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1
6 0.10153089 40 nips-2004-Common-Frame Model for Object Recognition
7 0.095219523 192 nips-2004-The power of feature clustering: An application to object detection
8 0.09013769 116 nips-2004-Message Errors in Belief Propagation
9 0.087970771 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
10 0.084004 73 nips-2004-Generative Affine Localisation and Tracking
11 0.083603971 99 nips-2004-Learning Hyper-Features for Visual Identification
12 0.074542709 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
13 0.071899302 28 nips-2004-Bayesian inference in spiking neurons
14 0.066402353 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
15 0.063076094 68 nips-2004-Face Detection --- Efficient and Rank Deficient
16 0.059892934 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
17 0.059454713 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers
18 0.058843225 100 nips-2004-Learning Preferences for Multiclass Problems
19 0.057677485 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
20 0.057534866 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM
topicId topicWeight
[(0, -0.186), (1, 0.023), (2, 0.005), (3, -0.076), (4, 0.206), (5, 0.03), (6, 0.11), (7, -0.006), (8, -0.127), (9, -0.0), (10, -0.115), (11, -0.096), (12, 0.021), (13, 0.06), (14, -0.004), (15, -0.061), (16, 0.107), (17, 0.048), (18, 0.129), (19, 0.025), (20, 0.134), (21, -0.071), (22, -0.051), (23, 0.055), (24, -0.037), (25, 0.063), (26, 0.036), (27, -0.083), (28, -0.064), (29, -0.116), (30, 0.042), (31, -0.058), (32, 0.005), (33, -0.142), (34, -0.083), (35, 0.067), (36, 0.0), (37, -0.169), (38, -0.039), (39, -0.032), (40, -0.023), (41, 0.108), (42, -0.095), (43, -0.081), (44, 0.084), (45, 0.036), (46, 0.098), (47, -0.105), (48, 0.19), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.96197259 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
Author: Antonio Torralba, Kevin P. Murphy, William T. Freeman
Abstract: We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes. 1
2 0.53080606 19 nips-2004-An Application of Boosting to Graph Classification
Author: Taku Kudo, Eisaku Maeda, Yuji Matsumoto
Abstract: This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency. 1
3 0.48205718 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse
Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1
4 0.47979486 44 nips-2004-Conditional Random Fields for Object Recognition
Author: Ariadna Quattoni, Michael Collins, Trevor Darrell
Abstract: We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as flexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a unified framework for part-based object recognition. The parameters of the CRF are estimated in a maximum likelihood framework and recognition proceeds by finding the most likely class under our model. The main advantage of the proposed CRF framework is that it allows us to relax the assumption of conditional independence of the observed data (i.e. local features) often used in generative approaches, an assumption that might be too restrictive for a considerable number of object classes.
5 0.46915734 40 nips-2004-Common-Frame Model for Object Recognition
Author: Pierre Moreels, Pietro Perona
Abstract: A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by drawing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. Occlusion is allowed. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance densities are shared across features. This is to be contrasted with previous work on probabilistic ‘constellation’ models where features depend on each other, and each feature and model have different pose and appearance statistics [1, 2]. These two differences allow us to build models containing hundreds of features, as well as to train each model from a single example. Our model may also be thought of as a probabilistic revisitation of Lowe’s model [3, 4]. We propose an efficient entropy-minimization inference algorithm that constructs the best interpretation of a scene as a collection of objects and clutter. We test our ideas with experiments on two image databases. We compare with Lowe’s algorithm and demonstrate better performance, in particular in presence of large amounts of background clutter.
6 0.45490673 192 nips-2004-The power of feature clustering: An application to object detection
7 0.40314516 99 nips-2004-Learning Hyper-Features for Visual Identification
8 0.35715672 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
9 0.35565099 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
10 0.3467108 141 nips-2004-Optimal sub-graphical models
11 0.34467849 73 nips-2004-Generative Affine Localisation and Tracking
12 0.33818278 116 nips-2004-Message Errors in Belief Propagation
13 0.33738664 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
14 0.33686247 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
15 0.33125073 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
16 0.32412514 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
17 0.31361866 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
18 0.28926891 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
19 0.28847766 68 nips-2004-Face Detection --- Efficient and Rank Deficient
20 0.28533676 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
topicId topicWeight
[(13, 0.072), (15, 0.139), (17, 0.035), (21, 0.019), (26, 0.046), (31, 0.018), (33, 0.187), (35, 0.022), (50, 0.026), (53, 0.02), (56, 0.014), (67, 0.276), (87, 0.014), (88, 0.018)]
simIndex simValue paperId paperTitle
1 0.86806661 200 nips-2004-Using Random Forests in the Structured Language Model
Author: Peng Xu, Frederick Jelinek
Abstract: In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing. 1
same-paper 2 0.82737601 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
Author: Antonio Torralba, Kevin P. Murphy, William T. Freeman
Abstract: We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes. 1
3 0.66003221 130 nips-2004-Newscast EM
Author: Wojtek Kowalczyk, Nikos A. Vlassis
Abstract: We propose a gossip-based distributed algorithm for Gaussian mixture learning, Newscast EM. The algorithm operates on network topologies where each node observes a local quantity and can communicate with other nodes in an arbitrary point-to-point fashion. The main difference between Newscast EM and the standard EM algorithm is that the M-step in our case is implemented in a decentralized manner: (random) pairs of nodes repeatedly exchange their local parameter estimates and combine them by (weighted) averaging. We provide theoretical evidence and demonstrate experimentally that, under this protocol, nodes converge exponentially fast to the correct estimates in each M-step of the EM algorithm. 1
4 0.65769821 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
5 0.65748018 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
6 0.65696275 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
7 0.65659082 99 nips-2004-Learning Hyper-Features for Visual Identification
8 0.6557914 68 nips-2004-Face Detection --- Efficient and Rank Deficient
9 0.65487289 161 nips-2004-Self-Tuning Spectral Clustering
10 0.6542964 178 nips-2004-Support Vector Classification with Input Data Uncertainty
11 0.65404481 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
12 0.65323818 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
13 0.653198 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
14 0.65314358 77 nips-2004-Hierarchical Clustering of a Mixture Model
15 0.65244377 179 nips-2004-Surface Reconstruction using Learned Shape Models
16 0.65230221 127 nips-2004-Neighbourhood Components Analysis
17 0.65171969 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
18 0.6515739 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
19 0.65144545 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes
20 0.65122211 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models