nips nips2007 nips2007-172 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 For accurate labeling it is important to capture the global context of the image as well as local information. [sent-2, score-0.278]
2 We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. [sent-3, score-0.357]
3 Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. [sent-4, score-0.113]
4 We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. [sent-5, score-0.238]
5 Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. [sent-7, score-0.264]
6 The resulting segmentations are compared to the state-of-the-art on three different image datasets. [sent-8, score-0.131]
7 1 Introduction In visual scene interpretation the goal is to assign image pixels to one of several semantic classes or scene elements, thus jointly performing segmentation and recognition. [sent-9, score-0.489]
8 This is useful in a variety of applications ranging from keyword-based image retrieval (using the segmentation to automatically index images) to autonomous vehicle navigation [1]. [sent-10, score-0.176]
9 Their applications range from low-level noise reduction [2] to high-level object or category recognition (this paper) and semi-automatic object segmentation [3]. [sent-12, score-0.2]
10 CRF models can be applied either at the pixel-level [5, 6, 7] or at the coarser level of super-pixels or patches [8, 9, 10]. [sent-14, score-0.176]
11 In this paper we label images at the level of small patches, using CRF models that incorporate both purely local (single patch) feature functions and more global ‘context capturing’ feature functions that depend on aggregates of observations over the whole image or large regions. [sent-15, score-0.389]
12 In practice it is difficult and time-consuming to label every pixel in an image and most of the available image interpretation datasets contain unlabeled pixels. [sent-17, score-0.461]
13 Working at the patch level exacerbates this problem because many patches contain several different pixel-level labels. [sent-18, score-0.438]
14 Our CRF training algorithm handles this by allowing partial and mixed labelings and optimizing the probability for the model segmentation to be consistent with the given labeling constraints. [sent-19, score-0.338]
15 2 A Conditional Random Field using Local and Global Image Features We represent images as rectangular grids of patches at a single scale, associating a hidden class label with each patch. [sent-21, score-0.34]
16 Our CRF models incorporate 4-neighbor couplings between patch labels. [sent-22, score-0.348]
17 The local image content of each patch is encoded using texture, color and position descriptors as in [10]. [sent-23, score-0.392]
18 For texture we compute the 128-dimensional SIFT descriptor [11] of the patch and vector quantize it by nearest-neighbour assignement against a ks = 1000 word texton dictionary learned by k-means clustering of all patches in the training dataset. [sent-24, score-0.53]
19 Position is encoded by overlaying the image with an m × m grid of cells (m = 8) and using the index of the cell in which the patch falls as its position feature. [sent-26, score-0.387]
20 Each patch is thus coded by three binary vectors with respectively ks , kh and kp = m2 bits, each with a single bit set corresponding to the observed visual word. [sent-27, score-0.389]
21 Generatively, the three modalities are modelled as being independent given the patch label. [sent-29, score-0.262]
22 The naive Bayes model of the image omits the 4-neighbor couplings and thus assumes that each patch label depends only on its three observation functions. [sent-30, score-0.478]
23 On the MSRC 9class image dataset this model returns an average classification rate of 67. [sent-32, score-0.124]
24 1% (see Section 4), so isolated appearance alone does not suffice for reliable patch labeling. [sent-33, score-0.284]
25 In recent years models based on histograms of visual words have proven very successful for image categorization (deciding whether or not the image as a whole belongs to a given category of scenes) [13]. [sent-34, score-0.236]
26 Motivated by this, many of our models take the global image context into account by including observation functions based on image-wide histograms of the visual words of their patches. [sent-35, score-0.185]
27 The hope is that this will help to overcome the ambiguities that arise when patches are classified in isolation. [sent-36, score-0.176]
28 To this end, we define a conditional model for patch labels that incorporates both local patch level features and global aggregate features. [sent-37, score-0.833]
29 , C} denote the label of patch i, yi denote the W -dimensional concatenated binary indicator vector of its three visual words (W = ks + hh + kp ), and h denote the normalized histogram of all visual words in the image, i. [sent-41, score-0.498]
30 The conditional probablity of the label xi is then modeled as p(xi = l|yi , h) ∝ exp − W w=1 (αwl yiw + βwl hw ) , (1) where αwl , βwl are W × C matrices of coefficients to be learned. [sent-44, score-0.168]
31 We can think of this as a multiplicative combination of a local classifier based on the patch-level observation yi and a global context or bias based on the image-wide histogram h. [sent-45, score-0.11]
32 To account for correlations among spatially neighboring patch labels, we add couplings between the labels of neighboring patches to the single patch model (1). [sent-46, score-0.801]
33 Let X denote the collection of all patch labels in the image and Y denote the collected patch features. [sent-47, score-0.658]
34 Then our CRF model for the coupled patch labels is: p(X|Y ) ∝ exp − E(X|Y ) , (2) W φij (xi , xj ), (αwxi yiw + βwxi hw ) + E(X|Y ) = i w=1 (3) i∼j where i ∼ j denotes the set of all adjacent (4-neighbor) pairs of patches i, j. [sent-48, score-0.533]
35 We have explored two forms of pairwise potential: φij (xi , xj ) = γxi ,xj [xi = xj ], and φij (xi , xj ) = (σ + τ dij ) [xi = xj ], where [·] is one if its argument is true and zero otherwise, and dij is some similarity measure over the appearance of the patches i and j. [sent-50, score-0.295]
36 The second potential is designed to favor label transitions at image locations with high contrast. [sent-52, score-0.153]
37 As in [3] we use dij = exp(− zi − zj 2 /(2λ)), with zi ∈ IR3 denoting the average RGB value in the patch and λ = zi − zj 2 , the average L2 norm between neighboring RGB values in the image. [sent-53, score-0.294]
38 2 Figure 1: Graphical representation of the model with a single image- wide aggregate feature function denoted by h. [sent-56, score-0.144]
39 Arrows denote single node potentials due to feature functions, and undirected edges represent pairwise potentials. [sent-58, score-0.105]
40 The dashed lines indicate the aggregation of the single- patch observations yi into h. [sent-59, score-0.29]
41 In practice this is restrictive and it is useful to develop methods that can learn from partially labeled examples – images that include either completely unlabeled patches or ones with a retricted but nontrivial set of possible labels. [sent-67, score-0.51]
42 Formally, we will assume that an incomplete labeling X is known to belong to an associated set of admissible labelings A and we maximise the log- likelihood for the model to predict any labeling in A: L = log p(X ∈ A | Y ) = log p(X|Y ) X∈A exp − E(X|Y ) = log X∈A − log exp − E(X|Y ) . [sent-68, score-0.329]
43 Here LBP is run twice (with the singleton marginals initialized from the single node potentials), once to estimate the marginals of p(X|Y ) and once for p(X | Y, X ∈ A). [sent-77, score-0.115]
44 A simple and often- used alternative is to discard unlabeled patches by excising nodes 3 Class and frequency Model Building 16. [sent-82, score-0.325]
45 5% Per Pixel IND loc only IND loc+glo CRFσ loc only CRFσ loc+glo CRFσ loc+glo del unlabeled CRFγ loc only CRFγ loc+glo CRFτ loc only CRFτ loc+glo Schroff et al. [sent-91, score-1.664]
46 For each class its frequency in the ground truth labeling is also given. [sent-203, score-0.103]
47 that correspond to unlabeled or partially labeled patches from the graph. [sent-204, score-0.422]
48 This leaves a random field with one or more completely labeled connected components whose log-likelihood p(X |Y ) we maximize directly using gradient based methods. [sent-205, score-0.135]
49 Equivalently, we can use the complete model but set all of the pair-wise potentials connected to unlabeled nodes to zero: this decouples the labels of the unlabeled nodes from the rest of the field. [sent-206, score-0.387]
50 As a result p(X|Y ) and p(X | Y, X ∈ A) are equivalent for the unlabeled nodes and their contribution to the log-likelihood in Eq. [sent-207, score-0.149]
51 Looking at the training labelings in Figure 3 and Figure 4, we see that pixels near class boundaries often remain unlabeled. [sent-211, score-0.269]
52 Since we leave patches unlabeled if they contain unlabeled pixels, label transitions are underrepresented in the training data, which causes the strength of the pairwise couplings to be greatly overestimated. [sent-212, score-0.609]
53 In contrast, the full CRF model provides realistic estimates because it is forced to include a (fully coupled) label transition somewhere in the unlabeled region. [sent-213, score-0.181]
54 This consists of 240 images of 213 × 320 pixels and their partial pixel-level labelings. [sent-216, score-0.223]
55 The labelings assign pixels to one of nine classes: building, grass, tree, cow, sky, plane, face, car, and bike. [sent-217, score-0.261]
56 Some sample images and labelings are shown in Figure 4. [sent-219, score-0.166]
57 In our experiments we divide the dataset into 120 images for training and 120 for testing, reporting average results over 20 random train-test partitions. [sent-220, score-0.125]
58 We used 20 × 20 pixel patches with centers at 10 pixel intervals. [sent-221, score-0.3]
59 (For the patch size see the red disc in Figure 4). [sent-222, score-0.341]
60 To obtain a labeling of the patches, pixels are assigned to the nearest patch center. [sent-223, score-0.501]
61 Patches are allowed to have any label seen among their pixels, with unlabeled pixels being allowed to have any label. [sent-224, score-0.317]
62 Learning and inference takes place at the patch level. [sent-225, score-0.262]
63 To map the patch-level segmentation back to the pixel level we assign each pixel the marginal of the patch with the nearest center. [sent-226, score-0.49]
64 (In Figure 4 the segmentations were post-processed by a applying a Gaussian filter over the pixel marginals with the scale set to half the patch spacing). [sent-227, score-0.406]
65 Models that incorporate 4-neighbor spatial couplings are denoted ‘CRF’ while ones that incorporate only (local or global) patch-level potentials are denoted ‘IND’. [sent-230, score-0.251]
66 Models that include global aggregate features are denoted ‘loc+glo’, while ones that include only on local patch-level features are denoted ‘loc only’. [sent-231, score-0.336]
67 Aggregate features (AF) were computed in each cell of a c × c image partition. [sent-236, score-0.169]
68 Accuracy 80 only c 1 to c 75 c to 10 local only 70 0 2 4 6 8 10 C Benefits of aggregate features. [sent-238, score-0.153]
69 The first main conclusion is that including global aggregate features helps, for example improving the average classification rate on the MSRC dataset from 67. [sent-239, score-0.236]
70 We experimented with dividing the image into c × c grids for a range of values of c. [sent-245, score-0.138]
71 In each cell of the grid we compute a separate histogram over the visual words, and for each patch in the cell we include an energy term based on this histogram in the same way as for the image-wide histogram in Eq. [sent-246, score-0.457]
72 Figure 2 shows how the performance of the individual patch classifier depends on the use of aggregate features. [sent-248, score-0.381]
73 From the dotted curve in the figure we see that although using larger cells to aggregate features is generally more informative, even fine 10 × 10 subdivisions (containing only 6–12 patches per cell) provide a significant performance increase. [sent-249, score-0.339]
74 The second main conclusion from Table 1 is that including spatial couplings (pairwise CRF potentials) helps, respectively increasing the accuracy by 10. [sent-253, score-0.104]
75 The improvement is particularly noticeable for rare classes when global aggregate features are not included: in this case the single node potentials are less informative and frequent classes tend to be unduly favored due to their large a priori probability. [sent-256, score-0.354]
76 The performance increment from global features is smallest for ‘CRFγ’, the model that also includes local contextual information. [sent-258, score-0.156]
77 The overall influence of the local label transition preferences expressed in ‘CRFγ’ appears to be similar to that of the global contextual information provided by image-wide aggregate features. [sent-259, score-0.288]
78 Our third main conclusion from Table 1 is that our marginalization based training method for handling missing labels is superior to the common heuristic of deleting any unlabeled patches. [sent-261, score-0.194]
79 Learning a ‘CRFσ loc+glo’ model by removing all unlabeled patches (‘del unlabeled’ in the table) leads to an estimate σ ≈ 11. [sent-262, score-0.3]
80 In particular, with ‘delete unlabeled’ training the accuracy of the model drops significantly for the classes plane and bike, both of which have a relatively small area relative to their boundaries and thus many partially labeled patches. [sent-265, score-0.215]
81 It is interesting to note that even though σ has been severely over-estimated in the ‘delete unlabeled’ model, the CRF still improves over the individual patch classification obtained with ‘IND loc+glo’ for most classes, albeit not for bike and only marginally for plane. [sent-266, score-0.309]
82 We now consider how the performance drops as the fraction of labeled pixels decreases. [sent-268, score-0.223]
83 5 85 Accuracy disc 0 75 disc 10 disc 20 80 70 65 60 CRFσ loc+glo IND loc+glo 20 30 40 50 60 70 Percentage of pixels labeled Figure 3: Recognition performance when learning from increasingly eroded label images (left). [sent-273, score-0.61]
84 The figure also shows the recognition performance of ‘CRFσ loc+glo’ and ‘IND loc+glo’ as a function of the fraction of labeled pixels. [sent-276, score-0.133]
85 Note that ‘CRFσ loc+glo’ learned from label images eroded with a disc of radius 30 (only 28% of pixels labeled) still outperforms ‘IND loc+glo’ learned from the original labeling (71% of pixels labeled). [sent-278, score-0.604]
86 Also, the CRF actually performs better with 5 pixels of erosion than with the original labeling, presumably because ambiguities related to training patches with mixed pixel labels are reduced. [sent-279, score-0.476]
87 Our CRF model clearly outperforms the approach of [15], which uses aggregate features of an optimized scale but lacks spatial coupling in a random field, giving a performance very similar to that of our ‘IND loc+glo’ model. [sent-282, score-0.204]
88 The Sowerby dataset consists of 104 images of 96 × 64 pixels of urban and rural scenes labeled with 7 different classes: sky, vegetation, road marking, road surface, building, street objects and cars. [sent-285, score-0.38]
89 The subset of the Corel dataset contains 100 images of 180 × 120 pixels of natural scenes, also labeled with 7 classes: rhino/hippo, polar bear, water, snow, vegetation, ground, and sky. [sent-286, score-0.316]
90 Here we used 10 × 10 pixel patches, with a spacing of respectively 2 and 5 pixels for the Sowerby and Corel datasets. [sent-287, score-0.221]
91 Table 2 compares the recognition accuracies averaged over pixels for our CRF and independent patch models to the results reported on these datasets for TextonBoost [7] and the multi-scale CRF model of [5]. [sent-289, score-0.47]
92 In this table ‘IND’ stands for results obtained when only the single node potentials are used in the respective models, disregarding the spatial random field couplings. [sent-290, score-0.113]
93 The total training time and test time per image are listed for the full CRF models. [sent-291, score-0.128]
94 5 Conclusion We presented several image-patch-level CRF models for semantic image labeling that incorporate both local patch-level observations and more global contextual features based on aggregates of observations at several scales. [sent-293, score-0.447]
95 We showed that partially labeled training images could be handled by maximizing the total likelihood of the image segmentations that comply with the partial labeling, using Loopy BP and Bethe free-energy approximations for the calculations. [sent-294, score-0.394]
96 This allowed us to learn effective CRF models from images where only a small fraction of the pixels were labeled and class transitions were not observed. [sent-295, score-0.288]
97 wide aggregate features is very helpful, while including additional aggregates at finer scales gives relatively little further improvement. [sent-310, score-0.232]
98 Comparative experiments showed that our patch-level CRFs have comparable performance to state-of-the-art pixel-level models while being much more efficient because the number of patches is much smaller than the number of pixels. [sent-311, score-0.176]
99 A semi-supervised learning approach to o u object recognition with spatial integration of local features and segmentation cues. [sent-366, score-0.282]
100 7 MSRC CRFσ loc+glo Labeling Sowerby CRFσ loc+glo Labeling Corel CRFσ loc+glo Labeling Figure 4: Samples from the MSRC, Sowerby, and Corel datasets with segmentation and labeling. [sent-400, score-0.106]
wordName wordTfidf (topN-words)
[('crf', 0.568), ('loc', 0.385), ('glo', 0.332), ('patch', 0.262), ('ind', 0.193), ('patches', 0.176), ('pixels', 0.136), ('unlabeled', 0.124), ('aggregate', 0.119), ('sowerby', 0.111), ('msrc', 0.11), ('labeling', 0.103), ('labelings', 0.101), ('image', 0.096), ('labeled', 0.087), ('corel', 0.082), ('segmentation', 0.08), ('disc', 0.079), ('aggregates', 0.069), ('images', 0.065), ('afs', 0.063), ('couplings', 0.063), ('pixel', 0.062), ('bethe', 0.059), ('label', 0.057), ('potentials', 0.051), ('bike', 0.047), ('textonboost', 0.047), ('marginals', 0.047), ('recognition', 0.046), ('global', 0.045), ('wl', 0.044), ('visual', 0.044), ('features', 0.044), ('grids', 0.042), ('annotations', 0.041), ('spatial', 0.041), ('labels', 0.038), ('object', 0.037), ('classes', 0.037), ('scene', 0.036), ('ks', 0.035), ('segmentations', 0.035), ('lbp', 0.035), ('sky', 0.035), ('partially', 0.035), ('local', 0.034), ('pairwise', 0.033), ('contextual', 0.033), ('loopy', 0.032), ('vision', 0.032), ('gibbs', 0.032), ('erosion', 0.032), ('fbethe', 0.032), ('rother', 0.032), ('schroff', 0.032), ('vegetation', 0.032), ('wxi', 0.032), ('yiw', 0.032), ('crfs', 0.032), ('dij', 0.032), ('training', 0.032), ('histogram', 0.031), ('classi', 0.031), ('cell', 0.029), ('conditional', 0.029), ('aggregation', 0.028), ('dataset', 0.028), ('partition', 0.028), ('eroded', 0.028), ('delete', 0.028), ('rgb', 0.028), ('datasets', 0.026), ('eld', 0.026), ('fields', 0.025), ('field', 0.025), ('xi', 0.025), ('gradient', 0.025), ('quantize', 0.025), ('kp', 0.025), ('verbeek', 0.025), ('hw', 0.025), ('denoted', 0.025), ('nodes', 0.025), ('plane', 0.024), ('assign', 0.024), ('grass', 0.023), ('spacing', 0.023), ('kh', 0.023), ('incorporate', 0.023), ('completely', 0.023), ('scenes', 0.022), ('cow', 0.022), ('appearance', 0.022), ('partial', 0.022), ('likelihood', 0.022), ('node', 0.021), ('proceedings', 0.021), ('graphics', 0.021), ('road', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
2 0.18230729 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
3 0.10904994 115 nips-2007-Learning the 2-D Topology of Images
Author: Nicolas L. Roux, Yoshua Bengio, Pascal Lamblin, Marc Joliveau, Balázs Kégl
Abstract: We study the following question: is the two-dimensional structure of images a very strong prior or is it something that can be learned with a few examples of natural images? If someone gave us a learning task involving images for which the two-dimensional topology of pixels was not known, could we discover it automatically and exploit it? For example suppose that the pixels had been permuted in a fixed but unknown way, could we recover the relative two-dimensional location of pixels on images? The surprising result presented here is that not only the answer is yes, but that about as few as a thousand images are enough to approximately recover the relative locations of about a thousand pixels. This is achieved using a manifold learning algorithm applied to pixels associated with a measure of distributional similarity between pixel intensities. We compare different topologyextraction approaches and show how having the two-dimensional topology can be exploited.
4 0.10853805 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
Author: Maryam Mahdaviani, Tanzeem Choudhury
Abstract: We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs). In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. In this paper, we introduce the semi-supervised virtual evidence boosting (sVEB) algorithm for training CRFs – a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. The objective function of sVEB combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. It reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real-world inference systems. Experiments on synthetic data and real activity traces collected from wearable sensors, illustrate that sVEB benefits from both the use of unlabeled data and automatic feature selection, and outperforms other semi-supervised approaches. 1
6 0.099333137 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning
7 0.090023458 69 nips-2007-Discriminative Batch Mode Active Learning
8 0.089801297 187 nips-2007-Structured Learning with Approximate Inference
9 0.080111861 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
10 0.079486452 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
11 0.072824083 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
12 0.070543788 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
13 0.068189725 113 nips-2007-Learning Visual Attributes
14 0.067571089 166 nips-2007-Regularized Boost for Semi-Supervised Learning
15 0.064893663 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
16 0.062031049 97 nips-2007-Hidden Common Cause Relations in Relational Learning
17 0.058817111 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
18 0.058713999 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
19 0.058220055 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
20 0.057142761 145 nips-2007-On Sparsity and Overcompleteness in Image Models
topicId topicWeight
[(0, -0.172), (1, 0.089), (2, -0.089), (3, -0.067), (4, 0.02), (5, 0.046), (6, 0.022), (7, 0.055), (8, 0.153), (9, 0.048), (10, 0.018), (11, 0.02), (12, -0.14), (13, 0.037), (14, -0.017), (15, 0.137), (16, -0.028), (17, 0.036), (18, 0.078), (19, 0.011), (20, -0.044), (21, 0.028), (22, 0.126), (23, -0.026), (24, -0.093), (25, 0.001), (26, 0.048), (27, -0.09), (28, 0.028), (29, -0.079), (30, -0.047), (31, 0.164), (32, 0.05), (33, -0.076), (34, -0.083), (35, 0.036), (36, -0.018), (37, 0.028), (38, 0.123), (39, 0.037), (40, -0.073), (41, 0.078), (42, -0.038), (43, 0.007), (44, -0.155), (45, 0.048), (46, -0.072), (47, 0.165), (48, -0.04), (49, -0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.94648206 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
2 0.68829483 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
Author: Maryam Mahdaviani, Tanzeem Choudhury
Abstract: We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs). In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. In this paper, we introduce the semi-supervised virtual evidence boosting (sVEB) algorithm for training CRFs – a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. The objective function of sVEB combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. It reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real-world inference systems. Experiments on synthetic data and real activity traces collected from wearable sensors, illustrate that sVEB benefits from both the use of unlabeled data and automatic feature selection, and outperforms other semi-supervised approaches. 1
3 0.61879843 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
4 0.61277699 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
5 0.584777 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
6 0.54141921 183 nips-2007-Spatial Latent Dirichlet Allocation
7 0.51444167 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
8 0.49870828 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
9 0.46619236 193 nips-2007-The Distribution Family of Similarity Distances
10 0.45379829 115 nips-2007-Learning the 2-D Topology of Images
11 0.4116337 166 nips-2007-Regularized Boost for Semi-Supervised Learning
12 0.39892846 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
13 0.39191464 97 nips-2007-Hidden Common Cause Relations in Relational Learning
14 0.39125058 187 nips-2007-Structured Learning with Approximate Inference
15 0.39044777 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
16 0.36587888 81 nips-2007-Estimating disparity with confidence from energy neurons
17 0.36007336 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
18 0.35998082 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
19 0.34918135 69 nips-2007-Discriminative Batch Mode Active Learning
20 0.33512411 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
topicId topicWeight
[(5, 0.065), (13, 0.03), (16, 0.025), (18, 0.014), (21, 0.104), (31, 0.025), (34, 0.023), (35, 0.024), (47, 0.063), (49, 0.01), (73, 0.253), (83, 0.119), (85, 0.029), (87, 0.067), (90, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.79942685 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
2 0.72395718 70 nips-2007-Discriminative K-means for Clustering
Author: Jieping Ye, Zheng Zhao, Mingrui Wu
Abstract: We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the kernel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. 1
3 0.59363174 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
4 0.5892942 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
Author: Michael Ross, Andrew Cohen
Abstract: This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects’ performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it represents classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than previous methods and provides a foundation for further exploration. 1
5 0.58739901 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
6 0.58677477 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
7 0.58669633 189 nips-2007-Supervised Topic Models
8 0.58092165 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
9 0.57778239 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
10 0.57545561 69 nips-2007-Discriminative Batch Mode Active Learning
11 0.57529825 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
12 0.57224619 175 nips-2007-Semi-Supervised Multitask Learning
13 0.57216454 97 nips-2007-Hidden Common Cause Relations in Relational Learning
14 0.57214701 59 nips-2007-Continuous Time Particle Filtering for fMRI
15 0.57170916 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
16 0.57167417 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
17 0.5715149 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
18 0.57146335 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
19 0.57115805 196 nips-2007-The Infinite Gamma-Poisson Feature Model
20 0.57022613 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data