nips nips2005 nips2005-55 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antonio Torralba, Alan S. Willsky, Erik B. Sudderth, William T. Freeman
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. [sent-9, score-0.335]
2 In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. [sent-10, score-0.159]
3 Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. [sent-11, score-0.705]
4 For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. [sent-12, score-0.575]
5 1 Introduction In this paper, we develop methods for analyzing the features composing a visual scene, thereby localizing and categorizing the objects in an image. [sent-15, score-0.216]
6 We would like to design learning algorithms that exploit relationships among multiple, partially labeled object categories during training. [sent-16, score-0.253]
7 Working towards this goal, we propose a hierarchical probabilistic model for the expected spatial locations of objects, and the appearance of visual features corresponding to each object. [sent-17, score-0.298]
8 In contrast, generative approaches can discover large, visually salient categories (such as foliage and buildings [2]) without supervision. [sent-21, score-0.17]
9 Partial segmentations can then be used to learn semantically interesting categories (such as cars and pedestrians) which are less visually distinctive, or present in fewer training images. [sent-22, score-0.164]
10 Moreover, generative models provide a natural framework for learning contextual relationships between objects, and transferring knowledge between related, but distinct, visual scenes. [sent-23, score-0.136]
11 Constellation LDA Transformed DP Figure 1: A scene with faces as described by three generative models. [sent-24, score-0.127]
12 Note: The LDA and TDP images are sampled from models learned from training images, while the Constellation image is a hand-constructed illustration. [sent-28, score-0.132]
13 The principal challenge in developing hierarchical models for scenes is specifying tractable, scalable methods for handling uncertainty in the number of objects. [sent-29, score-0.204]
14 We address this problem using Dirichlet processes [3], a tool from nonparametric Bayesian analysis for learning mixture models whose number of components is not fixed, but instead estimated from data. [sent-31, score-0.132]
15 In particular, we extend the recently proposed hierarchical Dirichlet process (HDP) [4, 5] framework to allow more flexible sharing of mixture components between images. [sent-32, score-0.228]
16 The resulting transformed Dirichlet process (TDP) is naturally suited to our scene understanding application, as well as many other domains where “style and content” are combined to produce the observed data [6]. [sent-33, score-0.235]
17 2 by reviewing several related generative models for objects and scenes. [sent-35, score-0.15]
18 5 by demonstrating object recognition and segmentation in street scenes. [sent-40, score-0.238]
19 2 Generative Models for Objects and Scenes Constellation models [7] describe single objects via the appearance of a fixed, and typically small, set of spatially constrained parts (see Fig. [sent-41, score-0.159]
20 Although they can successfully recognize objects in cluttered backgrounds, they do not directly provide a mechanism for detecting multiple object instances. [sent-43, score-0.293]
21 In addition, it seems difficult to generalize the fixed set of constellation parts to problems where the number of objects is uncertain. [sent-44, score-0.163]
22 More recently, distributions over hierarchical tree–structured partitions of image pixels have been used to segment simple scenes [9, 10]. [sent-46, score-0.256]
23 In addition, an image parsing [11] framework has been proposed which explains an image using a set of regions generated by generic or object–specific processes. [sent-47, score-0.136]
24 Inspired by techniques from the text analysis literature, several recent papers analyze scenes using a spatially unstructured bag of features extracted from local image patches (see Fig. [sent-51, score-0.232]
25 In particular, latent Dirichlet allocation (LDA) [13] describes the features xji in image j using a K component mixture model with parameters θk . [sent-53, score-0.36]
26 Each image reuses these same mixture parameters in different proportions πj (see the graphical model of Fig. [sent-54, score-0.199]
27 By appropriately defining these shared mixtures, LDA may be used to discover object cat- egories from images of single objects [2], categorize natural scenes [14], and (with a slight extension) parse presegmented captioned images [15]. [sent-56, score-0.524]
28 While these LDA models are sometimes effective, their neglect of spatial structure ignores valuable information which is critical in challenging object detection tasks. [sent-57, score-0.231]
29 We recently proposed a hierarchical extension of LDA which learns shared parts describing the internal structure of objects, and contextual relationships among known groups of objects [16]. [sent-58, score-0.423]
30 The transformed Dirichlet process (TDP) addresses a key limitation of this model by allowing uncertainty in the number and identity of the objects depicted in each image. [sent-59, score-0.258]
31 1, the TDP effectively provides a textural model in which locally unstructured clumps of features are given global spatial structure by the inferred set of objects underlying each scene. [sent-62, score-0.278]
32 We then introduce the transformed Dirichlet process (TDP) (Sec. [sent-68, score-0.148]
33 Computationally, this process is conveniently described by a set z of independently sampled variables zi ∼ Mult(β) indicating the component of the mixture G(θ) (see eq. [sent-83, score-0.176]
34 This process is sometimes described by analogy to a Chinese restaurant in which the (infinite collection of) tables correspond to the mixture components θk , and customers to observations xi [4]. [sent-90, score-0.385]
35 Customers are social, tending to sit at tables with many other customers (observations), and each table shares a single dish (parameter). [sent-91, score-0.336]
36 For example, in this paper’s applications each group is an image, and the data are visual features composing a scene. [sent-94, score-0.164]
37 , xjnj ) denote the nj exchangeable data points in group j. [sent-98, score-0.155]
38 LDA directly assigns observations xji to clusters via indicators zji . [sent-103, score-0.34]
39 HDP and TDP models use “table” indicators tji as an intermediary between observations and assignments kjt to an infinite global mixture with weights β. [sent-104, score-0.743]
40 Specializing the TDP to visual scenes (right), we model the position yji and appearance wji of features using distributions ηo indexed by unobserved object categories oji . [sent-106, score-0.778]
41 To construct an HDP, a global probability measure G0 ∼ DP(γ, H) is first chosen to define a set of shared mixture components. [sent-108, score-0.181]
42 The generative process underlying HDPs may be understood in terms of an extension of the DP analogy known as the Chinese restaurant franchise [4]. [sent-112, score-0.204]
43 Each group defines a separate restaurant in which customers (observations) xji sit at tables tji . [sent-113, score-0.689]
44 Each table shares a single dish (parameter) θ, which is ordered from a menu G0 shared among restaurants (groups). [sent-114, score-0.202]
45 Letting kjt indicate the parameter θkjt assigned to table t in group j, we may integrate over G0 and Gj (as in eq. [sent-115, score-0.442]
46 As before, customers prefer tables t at which many customers njt are already seated (eq. [sent-126, score-0.27]
47 Each new table is assigned a dish kj t according to eq. [sent-128, score-0.203]
48 Given the assignments tj and kj for group j, observations are sampled as xji ∼ F (θzji ), where zji = kjtji indexes the shared parameters assigned to the table associated with xji . [sent-133, score-0.802]
49 2, the group distributions Gj are derived from the global distribution G0 by resampling the mixture weights from a Dirichlet process (see eq. [sent-136, score-0.204]
50 Consider, for example, a Gaussian distribution describing the location at which object features are detected in an image. [sent-139, score-0.24]
51 While the covariance of that distribution may stay relatively constant across object instances, the mean will change dramatically from image to image (group to group), depending on the objects’ position relative to the camera. [sent-140, score-0.263]
52 Motivated by these difficulties, we propose the Transformed Dirichlet Process (TDP), an extension of the HDP in which global mixture components undergo a set of random transformations before being reused in each group. [sent-141, score-0.333]
53 (1) to create a global measure describing both parameters and transformations: ∞ βk δ(θ, θk )q(ρ | φk ) G0 (θ, ρ) = θk ∼ H φk ∼ R (6) k=1 As before, β is sampled from a stick–breaking process with parameter γ. [sent-144, score-0.139]
54 Marginalizing over transformations ρ, Gj (θ) reuses parameters from G0 (θ) exactly as in eq. [sent-146, score-0.129]
55 Conditioning on θk , it can be shown that Gj (ρ | θk ) ∼ DP(αβk , Q(φk )), so that the proportions ω jk of features associated with each transformation of θk follow a stick–breaking process with parameter αβk . [sent-149, score-0.17]
56 ¯ ¯ Each observation xji is now generated by sampling (θji , ρji ) ∼ Gj , and then choosing ¯ ¯ ¯ xji ∼ F (θji , ρji ) from a distribution which transforms θji by ρji . [sent-150, score-0.366]
57 Although the global ¯ family of transformation distributions Q(φ) is typically non–atomic, the discreteness of Gj ensures that transformations are shared between observations within group j. [sent-151, score-0.29]
58 Computationally, the TDP is more conveniently described via an extension of the Chinese restaurant franchise analogy (see Fig. [sent-152, score-0.138]
59 As before, customers (observations) xji sit at tables tji according to the clustering bias of eq. [sent-154, score-0.589]
60 (4), and new tables choose dishes according to their popularity across the franchise (eq. [sent-155, score-0.143]
61 Now, however, the dish (parameter) θkjt at table t is seasoned (transformed) according to ρjt ∼ q(ρjt | φkjt ). [sent-157, score-0.141]
62 The simplest implementation samples table assignments t, cluster assignments k, transformations ρ, and parameters θ, φ. [sent-161, score-0.327]
63 Let t−ji denote all table assignments excluding tji , and define k−jt , ρ−jt similarly. [sent-162, score-0.31]
64 2), we have p tji = t | t−ji , k, ρ, θ, x ∝ p t | t−ji f xji | θkjt , ρjt (8) The first term is given by eq. [sent-164, score-0.394]
65 For a fixed set of transformations ρ, the second term is a simple likelihood evaluation for existing tables, while new tables may be evaluated by marginalizing over possible cluster assignments (eq. [sent-166, score-0.3]
66 Right: Global TDP distribution G0 (θ, ρ) over both clusters θ (solid) and translations ρ of those clusters (dashed). [sent-173, score-0.136]
67 Conditioned on kjt , we again use conjugacy to sample ρjt . [sent-175, score-0.316]
68 For the moment, we assume that the observed data xji = (oji , yji ), where yji is the position of a feature corresponding to object category oji , and the number of object categories O is known (see Fig. [sent-182, score-0.962]
69 We then choose cluster parameters θk = (¯k , µk , Λk ) to describe the mean µk and o covariance Λk of a Gaussian distribution over feature positions, as well as the single object category ok assigned to all observations sampled from that cluster. [sent-184, score-0.39]
70 Although this cluster ¯ parameterization does not capture contextual relationships between object categories, the results of Sec. [sent-185, score-0.259]
71 5 demonstrate that it nevertheless provides an effective model of the spatial variability of individual categories across many different scenes. [sent-186, score-0.141]
72 Density models for spatial transformations have been previously used to recognize isolated objects [17], and estimate layered decompositions of video sequences [18]. [sent-189, score-0.28]
73 In contrast, the proposed TDP models the variability of object positions across scenes, and couples this with a nonparametric prior allowing uncertainty in the number of objects. [sent-190, score-0.159]
74 To ensure that the TDP scene model is identifiable, we define p (ρjt | kj , φ) to be a zero– mean Gaussian with covariance φkjt . [sent-191, score-0.143]
75 The parameter prior R is uniform across object categories, while R and H both use inverse–Wishart position distributions, weakly biased towards moderate covariances. [sent-192, score-0.159]
76 3 shows a 2D synthetic example based on a single object category (O = 1). [sent-194, score-0.226]
77 In contrast, the learned HDP uses a large set of global clusters to discretize the transformations underlying the data, and thus generalizes poorly to new translations. [sent-196, score-0.21]
78 1 to images, we must learn the relationship between object categories and visual features. [sent-200, score-0.311]
79 We assume that the appearance wji of each detected feature is independently sampled conditioned on the underlying object category oji (see Fig. [sent-204, score-0.484]
80 Placing a symmetric Dirichlet prior, with parameter λ, on each category’s multinomial appearance distribution ηo , p wji = b | oji = o, w−ji , t, k, θ ∝ cbo + λ (10) where cbo is the number of times feature b is currently assigned to object o. [sent-206, score-0.476]
81 Because a single object category is associated with each cluster, the Gibbs sampler of Sec. [sent-207, score-0.279]
82 5 Analyzing Street Scenes To demonstrate the potential of our TDP scene model, we consider a set of street scene images (250 training, 75 test) from the MIT-CSAIL database. [sent-211, score-0.265]
83 All categories were labeled in 112 images, while in the remainder only cars were segmented. [sent-213, score-0.132]
84 Training from semi–supervised data is accomplished by restricting object category assignments for segmented features. [sent-214, score-0.293]
85 4 shows the four global object clusters learned following 100 Gibbs sampling iterations. [sent-216, score-0.27]
86 There is one elongated car cluster, one large building cluster, and two road clusters with differing shapes. [sent-217, score-0.267]
87 Interestingly, the model has automatically determined that building features occur in large homogeneous patches, while road features are sparse and better described by many smaller transformed clusters. [sent-218, score-0.321]
88 4 shows segmentations produced by averaging these samples, as well as transformed clusters from the final iteration. [sent-221, score-0.222]
89 Qualitatively, results are typically good, although foliage is often mislabeled as road due to the textural similarities with features detected in shadows across roads. [sent-222, score-0.156]
90 For comparison, we also trained an LDA model based solely on feature appearance, allowing three topics per object category and again using object labels to restrict the Gibbs sampler’s assignments [16]. [sent-223, score-0.452]
91 4, our TDP model of spatial scene structure significantly improves segmentation performance. [sent-225, score-0.165]
92 In addition, through the set of transformed car clusters generated by the Gibbs sampler, the TDP explicitly estimates the number of object instances underlying each image. [sent-226, score-0.409]
93 These detections, which are not possible using LDA, are based on a single global parsing of the scene which automatically estimates object locations without a “sliding window” [1]. [sent-227, score-0.321]
94 6 Discussion We have developed the transformed Dirichlet process, a hierarchical model which shares a set of stochastically transformed clusters among groups of data. [sent-228, score-0.498]
95 Applied to visual scenes, TDPs provide a model of spatial structure which allows the number of objects generating an image to be automatically inferred, and lead to improved detection performance. [sent-229, score-0.292]
96 8 1 Figure 4: TDP analysis of street scenes containing cars (red), buildings (green), and roads (blue). [sent-262, score-0.23]
97 Top right: Global model G0 describing object shape (solid) and expected transformations (dashed). [sent-263, score-0.291]
98 Left: Four test images (first row), estimated segmentations of features into object categories (second row), transformed global clusters associated with each image interpretation (third row), and features assigned to different instances of the transformed car cluster (fourth row). [sent-265, score-0.989]
99 A Bayesian approach to unsupervised one-shot learning of object categories. [sent-313, score-0.159]
100 A Bayesian hierarchical model for learning natural scene categories. [sent-370, score-0.183]
wordName wordTfidf (topN-words)
[('tdp', 0.527), ('kjt', 0.316), ('jt', 0.275), ('tji', 0.211), ('dirichlet', 0.199), ('xji', 0.183), ('hdp', 0.166), ('object', 0.159), ('lda', 0.141), ('dp', 0.128), ('transformed', 0.122), ('oji', 0.12), ('objects', 0.11), ('scenes', 0.108), ('transformations', 0.099), ('hierarchical', 0.096), ('categories', 0.094), ('gj', 0.091), ('yji', 0.09), ('scene', 0.087), ('ji', 0.085), ('customers', 0.084), ('dish', 0.079), ('road', 0.078), ('gibbs', 0.077), ('mixture', 0.077), ('tables', 0.072), ('clusters', 0.068), ('category', 0.067), ('assignments', 0.067), ('cluster', 0.062), ('shared', 0.061), ('reused', 0.06), ('zji', 0.06), ('car', 0.06), ('stick', 0.06), ('groups', 0.06), ('group', 0.058), ('visual', 0.058), ('jk', 0.056), ('kj', 0.056), ('sampler', 0.053), ('constellation', 0.053), ('wji', 0.052), ('image', 0.052), ('appearance', 0.049), ('features', 0.048), ('street', 0.048), ('spatial', 0.047), ('blog', 0.045), ('dps', 0.045), ('franchise', 0.045), ('hdps', 0.045), ('tdps', 0.045), ('global', 0.043), ('images', 0.043), ('restaurant', 0.042), ('generative', 0.04), ('breaking', 0.04), ('proportions', 0.04), ('sit', 0.039), ('contextual', 0.038), ('cars', 0.038), ('sampled', 0.037), ('zi', 0.036), ('elongated', 0.036), ('buildings', 0.036), ('torralba', 0.036), ('assigned', 0.036), ('nj', 0.035), ('iccv', 0.034), ('chinese', 0.033), ('describing', 0.033), ('table', 0.032), ('parsing', 0.032), ('exchangeable', 0.032), ('segmentations', 0.032), ('segmentation', 0.031), ('cbo', 0.03), ('njt', 0.03), ('reuses', 0.03), ('seasoned', 0.03), ('sudderth', 0.03), ('textural', 0.03), ('unlocalized', 0.03), ('xjnj', 0.03), ('shares', 0.03), ('components', 0.029), ('observations', 0.029), ('dishes', 0.026), ('processes', 0.026), ('process', 0.026), ('analogy', 0.026), ('building', 0.025), ('extension', 0.025), ('detection', 0.025), ('face', 0.024), ('recognize', 0.024), ('descriptors', 0.024), ('bag', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
Author: Antonio Torralba, Alan S. Willsky, Erik B. Sudderth, William T. Freeman
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images. 1
2 0.16336299 98 nips-2005-Infinite latent feature models and the Indian buffet process
Author: Zoubin Ghahramani, Thomas L. Griffiths
Abstract: We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features. We identify a simple generative process that results in the same distribution over equivalence classes, which we call the Indian buffet process. We illustrate the use of this distribution as a prior in an infinite latent feature model, deriving a Markov chain Monte Carlo algorithm for inference in this model and applying the algorithm to an image dataset. 1
3 0.13708827 48 nips-2005-Context as Filtering
Author: Daichi Mochihashi, Yuji Matsumoto
Abstract: Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general. However, the problem of context length has almost been neglected so far and a na¨ve bag-of-words history has been ı employed in natural language processing. In contrast, in this paper we view topic shifts within a text as a latent stochastic process to give an explicit probabilistic generative model that has partial exchangeability. We propose an online inference algorithm using particle filters to recognize topic shifts to employ the most appropriate length of context automatically. Experiments on the BNC corpus showed consistent improvement over previous methods involving no chronological order. 1
4 0.13104293 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories
Author: Nicolas Loeff, Himanshu Arora, Alexander Sorokin, David Forsyth
Abstract: We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model represents the configuration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The complexity of the model in the number of features is low, meaning our model is much more efficient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be orders of magnitude faster than previous approaches while incorporating many more features. This results in both accuracy and localization improvements. Our model has been carefully tested on standard datasets; we compare with a number of recent template models. In particular, we demonstrate state-of-the-art results for detection and localization. 1
5 0.12170541 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
Author: Wei Zhang, Hyejin Yang, Dimitris Samaras, Gregory J. Zelinsky
Abstract: We present a computational model of human eye movements in an object class detection task. The model combines state-of-the-art computer vision object class detection methods (SIFT features trained using AdaBoost) with a biologically plausible model of human eye movement to produce a sequence of simulated fixations, culminating with the acquisition of a target. We validated the model by comparing its behavior to the behavior of human observers performing the identical object class detection task (looking for a teddy bear among visually complex nontarget objects). We found considerable agreement between the model and human data in multiple eye movement measures, including number of fixations, cumulative probability of fixating the target, and scanpath distance.
6 0.096638918 52 nips-2005-Correlated Topic Models
7 0.085480951 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection
8 0.079965882 151 nips-2005-Pattern Recognition from One Example by Chopping
9 0.079900064 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts
10 0.077767409 131 nips-2005-Multiple Instance Boosting for Object Detection
11 0.077637166 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
12 0.076730572 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
13 0.076725066 100 nips-2005-Interpolating between types and tokens by estimating power-law generators
14 0.07122273 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects
15 0.065293156 79 nips-2005-Fusion of Similarity Data in Clustering
16 0.062804602 115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation
17 0.061091505 178 nips-2005-Soft Clustering on Graphs
18 0.059167162 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
19 0.057660375 110 nips-2005-Learning Depth from Single Monocular Images
20 0.057502788 108 nips-2005-Layered Dynamic Textures
topicId topicWeight
[(0, 0.169), (1, 0.015), (2, -0.02), (3, 0.224), (4, -0.048), (5, -0.089), (6, 0.073), (7, 0.201), (8, -0.102), (9, -0.156), (10, -0.049), (11, -0.016), (12, 0.082), (13, -0.047), (14, -0.1), (15, 0.107), (16, -0.029), (17, -0.033), (18, -0.013), (19, 0.015), (20, -0.028), (21, 0.091), (22, -0.115), (23, 0.062), (24, -0.002), (25, 0.009), (26, 0.025), (27, -0.084), (28, -0.024), (29, -0.059), (30, -0.051), (31, -0.08), (32, -0.004), (33, -0.044), (34, -0.052), (35, 0.015), (36, 0.012), (37, 0.03), (38, 0.022), (39, 0.068), (40, -0.03), (41, -0.031), (42, 0.066), (43, 0.089), (44, -0.028), (45, -0.071), (46, -0.015), (47, 0.065), (48, -0.046), (49, -0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.95445013 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
Author: Antonio Torralba, Alan S. Willsky, Erik B. Sudderth, William T. Freeman
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images. 1
2 0.76514292 98 nips-2005-Infinite latent feature models and the Indian buffet process
Author: Zoubin Ghahramani, Thomas L. Griffiths
Abstract: We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features. We identify a simple generative process that results in the same distribution over equivalence classes, which we call the Indian buffet process. We illustrate the use of this distribution as a prior in an infinite latent feature model, deriving a Markov chain Monte Carlo algorithm for inference in this model and applying the algorithm to an image dataset. 1
3 0.68427777 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories
Author: Nicolas Loeff, Himanshu Arora, Alexander Sorokin, David Forsyth
Abstract: We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model represents the configuration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The complexity of the model in the number of features is low, meaning our model is much more efficient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be orders of magnitude faster than previous approaches while incorporating many more features. This results in both accuracy and localization improvements. Our model has been carefully tested on standard datasets; we compare with a number of recent template models. In particular, we demonstrate state-of-the-art results for detection and localization. 1
4 0.61728042 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection
Author: Long Zhu, Alan L. Yuille
Abstract: We describe a hierarchical compositional system for detecting deformable objects in images. Objects are represented by graphical models. The algorithm uses a hierarchical tree where the root of the tree corresponds to the full object and lower-level elements of the tree correspond to simpler features. The algorithm proceeds by passing simple messages up and down the tree. The method works rapidly, in under a second, on 320 × 240 images. We demonstrate the approach on detecting cats, horses, and hands. The method works in the presence of background clutter and occlusions. Our approach is contrasted with more traditional methods such as dynamic programming and belief propagation. 1
5 0.60027629 48 nips-2005-Context as Filtering
Author: Daichi Mochihashi, Yuji Matsumoto
Abstract: Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general. However, the problem of context length has almost been neglected so far and a na¨ve bag-of-words history has been ı employed in natural language processing. In contrast, in this paper we view topic shifts within a text as a latent stochastic process to give an explicit probabilistic generative model that has partial exchangeability. We propose an online inference algorithm using particle filters to recognize topic shifts to employ the most appropriate length of context automatically. Experiments on the BNC corpus showed consistent improvement over previous methods involving no chronological order. 1
6 0.59680605 100 nips-2005-Interpolating between types and tokens by estimating power-law generators
7 0.57852209 151 nips-2005-Pattern Recognition from One Example by Chopping
8 0.46734992 79 nips-2005-Fusion of Similarity Data in Clustering
9 0.45353252 52 nips-2005-Correlated Topic Models
10 0.45231697 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
11 0.40966544 35 nips-2005-Bayesian model learning in human visual perception
12 0.39774665 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
13 0.39045915 131 nips-2005-Multiple Instance Boosting for Object Detection
14 0.37919515 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects
15 0.35720763 115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation
16 0.34990332 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts
17 0.34685785 171 nips-2005-Searching for Character Models
18 0.3447738 110 nips-2005-Learning Depth from Single Monocular Images
19 0.32342443 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
20 0.32136008 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation
topicId topicWeight
[(3, 0.045), (10, 0.031), (27, 0.035), (31, 0.057), (34, 0.047), (39, 0.038), (47, 0.024), (55, 0.017), (57, 0.011), (69, 0.054), (73, 0.439), (88, 0.073), (91, 0.028)]
simIndex simValue paperId paperTitle
1 0.92296422 104 nips-2005-Laplacian Score for Feature Selection
Author: Xiaofei He, Deng Cai, Partha Niyogi
Abstract: In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are “wrapper” techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a “filter” method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm. 1
2 0.91050869 71 nips-2005-Fast Krylov Methods for N-Body Learning
Author: Nando D. Freitas, Yang Wang, Maryam Mahdaviani, Dustin Lang
Abstract: This paper addresses the issue of numerical computation in machine learning domains based on similarity metrics, such as kernel methods, spectral techniques and Gaussian processes. It presents a general solution strategy based on Krylov subspace iteration and fast N-body learning methods. The experiments show significant gains in computation and storage on datasets arising in image segmentation, object detection and dimensionality reduction. The paper also presents theoretical bounds on the stability of these methods.
same-paper 3 0.90289271 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
Author: Antonio Torralba, Alan S. Willsky, Erik B. Sudderth, William T. Freeman
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images. 1
4 0.83437598 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails
Author: Nebojsa Jojic, Vladimir Jojic, Christopher Meek, David Heckerman, Brendan J. Frey
Abstract: We introduce a new model of genetic diversity which summarizes a large input dataset into an epitome, a short sequence or a small set of short sequences of probability distributions capturing many overlapping subsequences from the dataset. The epitome as a representation has already been used in modeling real-valued signals, such as images and audio. The discrete sequence model we introduce in this paper targets applications in genetics, from multiple alignment to recombination and mutation inference. In our experiments, we concentrate on modeling the diversity of HIV where the epitome emerges as a natural model for producing relatively small vaccines covering a large number of immune system targets known as epitopes. Our experiments show that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains. We also discuss epitome designs that take into account uncertainty about Tcell cross reactivity and epitope presentation. In our experiments, we find that vaccine optimization is fairly robust to these uncertainties. 1
5 0.81837112 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
Author: Tong Zhang, Rie Kubota Ando
Abstract: We consider a framework for semi-supervised learning using spectral decomposition based un-supervised kernel design. This approach subsumes a class of previously proposed semi-supervised learning methods on data graphs. We examine various theoretical properties of such methods. In particular, we derive a generalization performance bound, and obtain the optimal kernel design by minimizing the bound. Based on the theoretical analysis, we are able to demonstrate why spectral kernel design based methods can often improve the predictive performance. Experiments are used to illustrate the main consequences of our analysis.
6 0.63368398 102 nips-2005-Kernelized Infomax Clustering
7 0.57865804 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering
8 0.56018966 189 nips-2005-Tensor Subspace Analysis
9 0.55548197 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
10 0.53665459 84 nips-2005-Generalization in Clustering with Unobserved Features
11 0.52320993 98 nips-2005-Infinite latent feature models and the Indian buffet process
12 0.51601863 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
13 0.50986004 178 nips-2005-Soft Clustering on Graphs
14 0.49742812 77 nips-2005-From Lasso regression to Feature vector machine
15 0.49547225 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories
16 0.49434045 177 nips-2005-Size Regularized Cut for Data Clustering
17 0.47863841 51 nips-2005-Correcting sample selection bias in maximum entropy density estimation
18 0.47525913 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
19 0.47310331 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
20 0.47278279 48 nips-2005-Context as Filtering