nips nips2007 nips2007-196 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. [sent-5, score-0.218]
2 We also derive a stochastic process that reproduces this distribution over equivalence classes. [sent-6, score-0.125]
3 This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. [sent-7, score-0.456]
4 Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. [sent-9, score-0.232]
5 An image can contain views of multiple object classes such as cars and humans and each class may have multiple occurrences in the image. [sent-20, score-0.324]
6 To deal with features having multiple occurrences, we introduce a probability distribution over sparse non-negative integer valued matrices with possibly an unbounded number of columns. [sent-21, score-0.29]
7 Each matrix row corresponds to a data point and each column to a feature similarly to the binary matrix used in the Indian buffet process [7]. [sent-22, score-0.191]
8 Each element of the matrix can be zero or a positive integer and expresses the number of times a feature occurs in a specific data point. [sent-23, score-0.164]
9 This model is derived by considering a finite gamma-Poisson distribution and taking the infinite limit for equivalence classes of non-negative integer valued matrices. [sent-24, score-0.249]
10 This process uses the Ewens’s distribution [5] over integer partitions which was introduced in population genetics literature and it is equivalent to the distribution over partitions of objects induced by the Dirichlet process [1]. [sent-26, score-0.305]
11 The infinite gamma-Poisson model can play the role of the prior in a nonparametric Bayesian learning scenario where both the latent features and the number of their occurrences are unknown. [sent-27, score-0.275]
12 Given this prior, we consider a likelihood model which is suitable for explaining the visual appearance and location of local image patches. [sent-28, score-0.18]
13 We assume that each data point is associated with a set of latent features and each feature can have multiple occurrences. [sent-35, score-0.162]
14 Let znk denote the number of times feature k occurs in the data point Xn . [sent-36, score-0.915]
15 Given K features, Z = {znk } is a N × K non-negative integer valued matrix that collects together all the znk values so as each row corresponds to a data point and each column to a feature. [sent-37, score-1.072]
16 Given that znk is drawn from a Poisson with a feature-specific parameter λk , Z follows the distribution N K P (Z|{λk }) = n=1 k=1 λznk exp{−λk } k = znk ! [sent-38, score-1.727]
17 α Γ( K ) (2) The hyperparameter α itself is given a vague gamma prior G(α; α0 , β0 ). [sent-42, score-0.114]
18 Using the above equations we can easily integrate out the parameters {λk } as follows K Γ(mk + P (Z|α) = α K) α k=1 α Γ( K )(N + 1)mk + K N n=1 znk ! [sent-43, score-0.853]
19 Using Equation (4) the expectation of the sum of znk s is αN and p is independent of the number of features. [sent-50, score-0.853]
20 3 The infinite limit and the stochastic process To express the probability distribution in (3) for infinite many features K we need to consider equivalence classes of Z matrices similarly to [7]. [sent-71, score-0.222]
21 We define the left-ordered form of non-negative integer valued matrices as follows. [sent-74, score-0.197]
22 We assume that for any possible znk holds znk ≤ c − 1, where c is a sufficiently large integer. [sent-75, score-1.706]
23 zN k ) as the integer number associated with column k that is expressed in a numeral system with basis c. [sent-79, score-0.12]
24 mk > 0 for k ≤ K+ , while the rest K −K+ features are unrepresented i. [sent-92, score-0.224]
25 The infinite limit of (6) is derived by following a similar strategy with the one used for expressing the distribution over partitions of objects as a limit of the Dirichlet-multinomial pair [6, 9]. [sent-95, score-0.145]
26 This expression defines an exchangeable joint distribution over non-negative integer valued matrices with infinite many columns in a left-ordered form. [sent-100, score-0.296]
27 1 The stochastic process The distribution in Equation (7) can be derived from a simple stochastic process that constructs the matrix Z sequentially so as the data arrive one at each time in a fixed order. [sent-103, score-0.115]
28 We sample feature occurrences from the set of unrepresented features as follows. [sent-106, score-0.312]
29 Firstly, we draw an integer number 1 g1 from the negative binomial N B(g1 ; α, 2 ) which has a mean value equal to α. [sent-107, score-0.14]
30 g1 is the total number of feature occurrences for the first data point. [sent-108, score-0.206]
31 + z1K1 = g1 and 1 ≤ K1 ≤ g1 , by drawing from Ewens’s distribution [5] over integer partitions which is given by P (z11 , . [sent-117, score-0.172]
32 , (8) (1) where vi is the multiplicity of integer i in the partition (z11 , . [sent-125, score-0.174]
33 The Ewens’s distribution is equivalent to the distribution over partitions of objects induced by the Dirichlet process and the Chinese restaurant process since we can derive the one from the other using simple combinatorics arguments. [sent-129, score-0.154]
34 The difference between them is that the former is a distribution over integer partitions while the latter is a distribution over partitions of objects. [sent-130, score-0.242]
35 For each feature k, with k ≤ Kn−1 , we choose znk based on the popularity of this feature in the previous n − 1 data 1 The partition of a positive integer is a way of writing this integer as a sum of positive integers where order does not matter, e. [sent-132, score-1.237]
36 This popularity is expressed by the total number of occurrences for the feature k which is n−1 n given by mk = i=1 zik . [sent-136, score-0.34]
37 Particularly, we draw znk from N B(znk ; mk , n+1 ) which has a mean mk value equal to n . [sent-137, score-1.111]
38 Similarly to the first data point, we first draw an n integer gn from N B(gn ; α, n+1 ), and subsequently we select a partition of that integer by drawing from the Ewens’s formula. [sent-139, score-0.305]
39 However, if we consider only the left-ordered class of matrices generated by the stochastic process then we obtain the exchangeable distribution in Equation (7). [sent-149, score-0.139]
40 2 Conditional distributions When we combine the prior P (Z|α) with a likelihood model p(X|Z) and we wish to do inference over Z using Gibbs-type sampling, we need to express the conditionals of the form P (znk |Z−(nk) , α) where Z−(nk) = Z \ znk . [sent-152, score-0.947]
41 When m−n,k > 0, the conditional of znk is given by e N B(znk ; m−n,k , NN ). [sent-158, score-0.876]
42 This conditional draws an integer number from N B(gn ; a, NN ) and then determines the occurrences for the new features by choosing a +1 partition of the integer gn using the Ewens’s distribution. [sent-160, score-0.501]
43 4 A likelihood model for images An image can contain multiple objects of different classes. [sent-163, score-0.141]
44 Unsupervised learning should deal with the unknown number of object classes in the images and also the unknown number of occurrences of each class in each image separately. [sent-167, score-0.281]
45 If object classes are the latent features, what we wish to infer is the underlying feature occurrence matrix Z. [sent-168, score-0.22]
46 Each image n is represented by dn local patches that are detected in the image so as Xn = (Yn , Wn ) = {(yni , wni ), i = 1, . [sent-171, score-0.345]
47 yni is the two-dimensional location of patch i and wni is an indicator vector (i. [sent-175, score-0.305]
48 is binary and satisfies L ℓ ℓ=1 wni = 1) that points into a set of L possible visual appearances. [sent-177, score-0.117]
49 We will describe the probabilistic model starting from the joint distribution of all variables which is given by joint = p(α)P (Z|α)p({θk }|Z)× dn N p(πn |Zn )p(mn , Σn |Zn ) n=1 P (sni |πn )P (wni |sni , {θk })p(yni |sni , mn , Σn ) . [sent-179, score-0.192]
50 (11) i=1 2 Features of this kind are the unrepresented features (k > K+ ) as well as all the unique features that occur only in the data point n (i. [sent-180, score-0.157]
51 Z α {θk } (mn, Σn) πn sni yni dn wni N Figure 1: Graphical model for the joint distribution in Equation (11). [sent-183, score-0.425]
52 Firstly, we generate α from its prior and then we draw the feature occurrence matrix Z using the infinite gamma-Poisson prior P (Z|α). [sent-186, score-0.186]
53 , θkL } describes the appearance of the local patches W for the feature (object class) k. [sent-191, score-0.166]
54 Note that the feature appearance parameters {θk } depend on Z only through the number of represented features K+ which is obtained by counting the non-zero columns of Z. [sent-193, score-0.169]
55 To see how this mixture model arises, notice that a local patch in image n belongs to a certain occurrence of a feature. [sent-195, score-0.239]
56 We use the double index kj to denote the j occurrence of feature k where j = 1, . [sent-196, score-0.134]
57 This mixture model has k K+ Mn = k=1 znk components, i. [sent-200, score-0.891]
58 as many as the total number of feature occurrences in image n. [sent-202, score-0.271]
59 The assignment variable sni = {skj }, which takes Mn values, indicates the feature occurrence of ni patch i. [sent-203, score-0.347]
60 The parameters (mn , Σn ) determine the image-specific distribution for the locations {yni } of the local patches in image n. [sent-206, score-0.188]
61 We assume that each occurrence of a feature forms a Gaussian cluster of patch locations. [sent-207, score-0.179]
62 Thus yni follows a image-specific Gaussian mixture with Mn components. [sent-208, score-0.162]
63 mn and Σn collect all the means and covariances of the clusters in the image n. [sent-211, score-0.171]
64 Given that any object can be anywhere in the image and have arbitrary scale and orientation, (mnkj , Σnkj ) should be drawn from a quite vague prior. [sent-212, score-0.138]
65 We use a conjugate normal-Wishart prior for the pair (mnkj , Σnkj ) so as znk N (mnkj |µ, τ Σnkj )W (Σ−1 |v, V ), nkj p(mn , Σn |Zn ) = (12) k:znk >0 j=1 where (µ, τ, v, V ) are the hyperparameters shared by all features and images. [sent-213, score-1.065]
66 The assignment sni which determines the allocation of a local patch in a certain feature occurrence follows a multinokj znk mial: P (sni |πn ) = k:znk >0 j=1 (πnkj )sni . [sent-214, score-1.189]
67 Similarly the observed data pair (wni , yni ) of a local image patch is generated according to K+ L wℓ θkℓni P (wni |sni , {θk }) = k=1 ℓ=1 Pznk j=1 skj ni and znk skj ni [N (yni |mnkj , Σnkj )] p(yni |sni , mn , Σn ) = . [sent-215, score-1.37]
68 1 MCMC inference Inference with our model involves expressing the posterior P (S, Z, α|X) over the feature occurrences Z, the assignments S and the parameter α. [sent-220, score-0.295]
69 Note that the joint P (S, Z, α, X) factorizes N according to p(α)P (Z|α)P (W |S, Z) n=1 P (Sn |Zn )p(Yn |Sn , Zn ) where Sn denotes the assignments associated with image n. [sent-221, score-0.134]
70 A single step can K+ new old change an element of Z by one so as |znk − znk | ≤ 1. [sent-225, score-0.853]
71 Initially Z is such that Mn = k=1 znk ≥ 1, for any n which means that at least one mixture component explains the data of each image. [sent-226, score-0.891]
72 The proposal distribution for changing znk s ensures that this constraint is satisfied. [sent-227, score-0.874]
73 Suppose we wish to sample a new value for znk using the joint model p(S, Z, α, X). [sent-228, score-0.893]
74 Simply witting P (znk |S, Z−(nk) , α, X) is not useful since when znk changes the number of states the assignments Sn can take also changes. [sent-229, score-0.903]
75 This is clear since znk is a structural variable that affects the number of K+ components Mn = k=1 znk of the mixture model associated with image n and assignments Sn . [sent-230, score-1.859]
76 On the other hand the dimensionality of the assignments S−n = S \ Sn of all other images is not affected when znk changes. [sent-231, score-0.922]
77 Sampling from P (Sn |S−n , Z, W, Yn ) is carried out by applying local Gibbs sampling moves and global Metropolis moves that allow two occurrences of different features to exchange their data clusters. [sent-235, score-0.269]
78 The discrete patch appearances correspond to pixels and can take 20 possible grayscale values. [sent-240, score-0.163]
79 To generate an image we first decide to include each feature with probability 0. [sent-242, score-0.127]
80 Then for each included feature we randomly select the number of occurrences from the range [1, 3]. [sent-244, score-0.206]
81 For each feature occurrence we select the pixels using the appearance multinomial and place the respective feature shape in a random location so that feature occurrences do not occlude each other. [sent-245, score-0.486]
82 The first row of Figure 2 shows a training image (left), the locations of pixels (middle) and the discrete appearances (right). [sent-246, score-0.237]
83 The third row of Figure 2 shows how K+ (left) and the sum of all znk s (right) evolve through the first 500 MCMC iterations. [sent-251, score-0.912]
84 The algorithm in the first 20 iterations has training image n locations Yn appearances Wn 1331 3230 0212 Figure 2: The first row shows a training image (left), the locations of pixels (middle) and the discrete appearances (right). [sent-252, score-0.409]
85 The second row shows the localizations of all feature occurrences in three images. [sent-253, score-0.29]
86 The third row shows how K+ (left) and the sum of all znk s (right) evolve through the first 500 MCMC iterations. [sent-255, score-0.912]
87 Figure 3: The left most plot on the first row shows the locations of detected patches and the bounding boxes in one of the annotated images. [sent-256, score-0.193]
88 The remaining five plots show examples of detections and localizations of the three most dominant features (including the car-category) in five non-annotated images. [sent-257, score-0.117]
89 For the state (Z, S) that is most frequently visited, the second row of Figure 2 shows the localizations of all different feature occurrences in three images. [sent-260, score-0.29]
90 Note that ellipses with the same color correspond to the different occurrences of the same feature. [sent-262, score-0.168]
91 We used the patch detection method presented in [8] and we constructed a dictionary of 200 visual appearances by clustering the SIFT [8] descriptors of the patches using K-means. [sent-264, score-0.212]
92 Locations of detected patches are shown in the first row (left) of Figure 3. [sent-265, score-0.113]
93 These znk s values plus the assignments of all patches inside the boxes do not change during sampling. [sent-269, score-0.986]
94 Also the patches that lie outside the boxes in all annotated images are not allowed to be part of car occurrences. [sent-270, score-0.122]
95 To keep the plots uncluttered, Figure 3 shows the detections and localizations of only the three most dominant features (including the car-category) in five non-annotated images. [sent-273, score-0.117]
96 The red ellipses correspond to different occurrences of the car-feature, the green ones to a tree-feature and the blue ones to a street-feature. [sent-274, score-0.168]
97 6 Discussion We presented the infinite gamma-Poisson model which is a nonparametric prior for non-negative integer valued matrices with infinite number of columns. [sent-275, score-0.249]
98 We discussed the use of this prior for unsupervised learning where multiple features are associated with our data and each feature can have multiple occurrences within each data point. [sent-276, score-0.345]
99 For example, an interesting application can be Bayesian matrix factorization where a matrix of observations is decomposed into a product of two or more matrices with one of them being a non-negative integer valued matrix. [sent-278, score-0.197]
100 Infinite latent feature models and the Indian buffet process. [sent-314, score-0.138]
wordName wordTfidf (topN-words)
[('znk', 0.853), ('occurrences', 0.144), ('nkj', 0.138), ('sni', 0.138), ('yni', 0.124), ('mk', 0.118), ('mn', 0.106), ('integer', 0.102), ('sn', 0.097), ('ewens', 0.096), ('mnkj', 0.096), ('wni', 0.096), ('appearances', 0.077), ('zn', 0.073), ('dirichlet', 0.066), ('image', 0.065), ('feature', 0.062), ('patch', 0.061), ('valued', 0.059), ('occurrence', 0.056), ('unrepresented', 0.055), ('gn', 0.055), ('patches', 0.053), ('features', 0.051), ('assignments', 0.05), ('nk', 0.049), ('partitions', 0.049), ('buffet', 0.048), ('localizations', 0.044), ('skj', 0.041), ('kh', 0.041), ('row', 0.04), ('factorial', 0.039), ('mixture', 0.038), ('indian', 0.037), ('nth', 0.037), ('object', 0.037), ('yn', 0.037), ('mcmc', 0.036), ('matrices', 0.036), ('vague', 0.036), ('exchangeable', 0.035), ('appearance', 0.032), ('conditionals', 0.031), ('ni', 0.03), ('hyperparameter', 0.03), ('equivalence', 0.03), ('locations', 0.03), ('boxes', 0.03), ('nonparametric', 0.029), ('nite', 0.029), ('latent', 0.028), ('reproduces', 0.027), ('dn', 0.027), ('gamma', 0.025), ('pixels', 0.025), ('columns', 0.024), ('vi', 0.024), ('location', 0.024), ('partition', 0.024), ('ellipses', 0.024), ('multiplicity', 0.024), ('stochastic', 0.024), ('prior', 0.023), ('process', 0.023), ('posterior', 0.023), ('unsupervised', 0.023), ('conditional', 0.023), ('detections', 0.022), ('draw', 0.022), ('multiple', 0.021), ('distribution', 0.021), ('bayesian', 0.021), ('visual', 0.021), ('sampling', 0.021), ('limit', 0.021), ('wish', 0.021), ('equation', 0.021), ('detected', 0.02), ('cars', 0.02), ('annotated', 0.02), ('evolve', 0.019), ('metropolis', 0.019), ('multinomial', 0.019), ('images', 0.019), ('joint', 0.019), ('likelihood', 0.019), ('local', 0.019), ('column', 0.018), ('moves', 0.017), ('objects', 0.017), ('visited', 0.017), ('binomial', 0.016), ('kj', 0.016), ('popularity', 0.016), ('integers', 0.016), ('classes', 0.016), ('xn', 0.016), ('expressing', 0.016), ('marginalize', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
2 0.073640779 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
3 0.069255322 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
4 0.053669997 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
5 0.050395958 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
Author: Max Welling, Ian Porteous, Evgeniy Bart
Abstract: A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of ‘hierarchical Dirichlet processes’ (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership stochastic block models as well as a number of new models are described as ISBNs. Two experiments have been performed to illustrate these ideas. 1
6 0.050221372 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
7 0.049240515 113 nips-2007-Learning Visual Attributes
8 0.043302659 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
9 0.041035861 193 nips-2007-The Distribution Family of Similarity Distances
10 0.039119147 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
11 0.038749523 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
12 0.037948079 145 nips-2007-On Sparsity and Overcompleteness in Image Models
13 0.037491582 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
14 0.03460484 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
15 0.034131918 170 nips-2007-Robust Regression with Twinned Gaussian Processes
16 0.034111008 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
17 0.033930149 213 nips-2007-Variational Inference for Diffusion Processes
18 0.033730242 125 nips-2007-Markov Chain Monte Carlo with People
19 0.033594064 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
20 0.033224795 159 nips-2007-Progressive mixture rules are deviation suboptimal
topicId topicWeight
[(0, -0.11), (1, 0.045), (2, -0.029), (3, -0.079), (4, 0.007), (5, 0.025), (6, -0.04), (7, 0.027), (8, -0.006), (9, 0.0), (10, -0.004), (11, 0.003), (12, 0.016), (13, 0.007), (14, -0.036), (15, 0.055), (16, 0.022), (17, 0.022), (18, -0.004), (19, -0.051), (20, -0.02), (21, 0.053), (22, 0.066), (23, 0.001), (24, -0.048), (25, 0.016), (26, -0.004), (27, -0.052), (28, 0.026), (29, -0.046), (30, -0.042), (31, 0.155), (32, 0.052), (33, 0.029), (34, -0.023), (35, 0.11), (36, 0.012), (37, 0.025), (38, 0.001), (39, -0.041), (40, -0.12), (41, 0.037), (42, -0.019), (43, 0.126), (44, -0.093), (45, 0.054), (46, -0.002), (47, 0.05), (48, 0.026), (49, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.90811008 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
2 0.62232047 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
3 0.6005457 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
4 0.55871022 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
Author: Sabri Boutemedjet, Djemel Ziou, Nizar Bouguila
Abstract: Content-based image suggestion (CBIS) targets the recommendation of products based on user preferences on the visual content of images. In this paper, we motivate both feature selection and model order identification as two key issues for a successful CBIS. We propose a generative model in which the visual features and users are clustered into separate classes. We identify the number of both user and image classes with the simultaneous selection of relevant visual features using the message length approach. The goal is to ensure an accurate prediction of ratings for multidimensional non-Gaussian and continuous image descriptors. Experiments on a collected data have demonstrated the merits of our approach.
5 0.55522668 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
6 0.49409053 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
7 0.492259 193 nips-2007-The Distribution Family of Similarity Distances
8 0.48867932 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
9 0.45090994 183 nips-2007-Spatial Latent Dirichlet Allocation
10 0.41973099 159 nips-2007-Progressive mixture rules are deviation suboptimal
11 0.41919914 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
12 0.39214602 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
13 0.38939589 77 nips-2007-Efficient Inference for Distributions on Permutations
14 0.38856399 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
15 0.38121855 115 nips-2007-Learning the 2-D Topology of Images
16 0.37663245 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
17 0.37591356 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data
18 0.36976773 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
19 0.36418712 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
20 0.36088744 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
topicId topicWeight
[(5, 0.045), (13, 0.017), (16, 0.021), (18, 0.024), (21, 0.074), (26, 0.012), (27, 0.277), (31, 0.012), (34, 0.03), (35, 0.035), (47, 0.056), (83, 0.147), (85, 0.021), (87, 0.056), (90, 0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.73911554 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
2 0.73500872 100 nips-2007-Hippocampal Contributions to Control: The Third Way
Author: Máté Lengyel, Peter Dayan
Abstract: Recent experimental studies have focused on the specialization of different neural structures for different types of instrumental behavior. Recent theoretical work has provided normative accounts for why there should be more than one control system, and how the output of different controllers can be integrated. Two particlar controllers have been identified, one associated with a forward model and the prefrontal cortex and a second associated with computationally simpler, habitual, actor-critic methods and part of the striatum. We argue here for the normative appropriateness of an additional, but so far marginalized control system, associated with episodic memory, and involving the hippocampus and medial temporal cortices. We analyze in depth a class of simple environments to show that episodic control should be useful in a range of cases characterized by complexity and inferential noise, and most particularly at the very early stages of learning, long before habitization has set in. We interpret data on the transfer of control from the hippocampus to the striatum in the light of this hypothesis. 1
3 0.6726594 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
Author: Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun
Abstract: We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as single regression tree for £tting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show signi£cant improvements of our proposed methods over some existing methods. 1
4 0.56970608 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
5 0.56913656 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
6 0.56621337 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
7 0.56508404 189 nips-2007-Supervised Topic Models
8 0.56260544 175 nips-2007-Semi-Supervised Multitask Learning
9 0.5611611 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
10 0.5611583 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
11 0.56037313 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
12 0.55965233 156 nips-2007-Predictive Matrix-Variate t Models
13 0.5595898 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
14 0.55889601 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
15 0.55849832 63 nips-2007-Convex Relaxations of Latent Variable Training
16 0.55738938 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
17 0.55671406 49 nips-2007-Colored Maximum Variance Unfolding
18 0.55657709 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
19 0.55622762 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
20 0.55603456 46 nips-2007-Cluster Stability for Finite Samples