cvpr cvpr2013 cvpr2013-253 knowledge-graph by maker-knowledge-mining

253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs


Source: pdf

Author: Siddhartha Chandra, Shailesh Kumar, C.V. Jawahar

Abstract: Understanding the nature of data is the key to building good representations. In domains such as natural images, the data comes from very complex distributions which are hard to capture. Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. Feature learning can thus be seen as a form of dimensionality reduction. In this paper, we describe a feature learning scheme for natural images. We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. Projections of the image patches into these subspaces gives us features, which we use to build image representations. Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com i Abstract Understanding the nature of data is the key to building good representations. [sent-8, score-0.055]

2 In domains such as natural images, the data comes from very complex distributions which are hard to capture. [sent-9, score-0.08]

3 Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. [sent-10, score-0.112]

4 Feature learning can thus be seen as a form of dimensionality reduction. [sent-11, score-0.033]

5 In this paper, we describe a feature learning scheme for natural images. [sent-12, score-0.033]

6 We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. [sent-13, score-0.18]

7 We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. [sent-14, score-0.421]

8 Projections of the image patches into these subspaces gives us features, which we use to build image representations. [sent-15, score-0.399]

9 Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. [sent-16, score-0.908]

10 Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks. [sent-17, score-0.319]

11 Introduction Feature extraction and modelling together dictate the overall complexity of any computer vision system. [sent-19, score-0.116]

12 Rich features that capture most of the complexity in the input space require simpler models while simpler features require more complex models. [sent-20, score-0.114]

13 This “law-of-conservation of complexity” in modelling has driven many efforts in feature engineering, especially, in complex domains such as computer vision where the raw input is not easily tamed by simple features. [sent-21, score-0.176]

14 Finding semantically rich features, that capture the inherent complexity of the input data, is a challenging and necessary pre-processing step in many machine learning applications. [sent-22, score-0.159]

15 We propose a feature learning framework motivated by the hypothesis: data really lies in multiple non-linear subspaces (as opposed to a single subspace). [sent-23, score-0.446]

16 Finding these subspaces and clustering the right data points into the right subspaces will result in the kind of features we are looking for. [sent-24, score-0.787]

17 Our approach requires that we solve the coupled problem of non-linear projection and clustering of data points into those projections simultaneously. [sent-25, score-0.266]

18 Clustering cannot be done in the raw input space because the data really lies in certain non-linear subspaces and the right subspaces cannot be discovered without proper groupings of the data. [sent-26, score-0.845]

19 While most of the work in clustering and projection methods is done independently, attempts have been made to combine them [1, 17]. [sent-27, score-0.127]

20 In this paper, we take this coupling a step forward by learning clusters and projections simultaneously. [sent-28, score-0.18]

21 This is fundamentally different from an approach like Sparse Subspace Clustering (SSC) [5] that first learns a sparse representation (SR) of the data and then applies spectral clustering to a similarity matrix built from this SR. [sent-29, score-0.151]

22 We further hypothesize that a mere non-linear clustering is not the best way to understand the nature of data. [sent-30, score-0.217]

23 Further simple clusters (concepts) might be present in each of the non-linear subspaces. [sent-31, score-0.038]

24 An overall solution should first find multiple non-linear sub-spaces within the data and then further cluster the data within each sub-space if necessary. [sent-32, score-0.041]

25 Once we discover the subspaces the data points (image patches) lie in, projections into these subspaces will give us the features that best represent the patches. [sent-33, score-0.835]

26 We propose a systematic framework for a two-level clustering of input data into meaningful clusters – first level being clustering coupled with non-linear projection by Restricted Boltzmann Machines (RBMs), and the second level being simple K-means clustering in each non-linear subspace. [sent-34, score-0.485]

27 In other words, we use K-RBMS for the first level clustering and K-means on the RBM projections for the second level clustering. [sent-35, score-0.227]

28 We apply our framework to clustering, improv- ing BoW and feature learning from raw image patches. [sent-36, score-0.089]

29 We demonstrate empirically that our clustering method is comparable to the state of the art methods in terms of accuracy, and much faster. [sent-37, score-0.099]

30 Representations based on K-RBM features 222777777866 outperform traditional deep learning and SIFT based BoW representations on image classification tasks. [sent-38, score-0.217]

31 Figure 1: RBM weights (learnt by the model) representing 20 non-linear subspaces in the Pascal 2007 data. [sent-39, score-0.367]

32 Local KRBM features are computed by projecting image patches to the subspace they belong to, and adding the biases. [sent-40, score-0.145]

33 Restricted Boltzmann Machines (RBMS) [22] are undirected, energy-based graphical models that learn a nonlinear subspace that the data fits to. [sent-41, score-0.159]

34 RBMs have been used successfully to learn features for image understanding and classification [12], speech representation [18], analyze user rating of movies [21] , and better bag-of-word representation of text data [20]. [sent-42, score-0.102]

35 Moreover, RBMS have been stacked together to learn hierarchical representations such as deep belief networks [12, 3] and convolutional deep belief networks [16] for finding semantically deeper features in complex domains such as images. [sent-43, score-0.619]

36 Most nonlinear subspace learning algorithms [6, 2] make various assumptions about the nature of the subspaces they intend to discover. [sent-44, score-0.596]

37 Figure 1 shows 20 nonlinear subspaces in VOC PASCAL 2007 data. [sent-48, score-0.392]

38 It is evident from the figure that the huge diversity in the image patches can not be captured by a single subspace. [sent-50, score-0.078]

39 The association of a data point to an RBM depends on the reconstruction error of each RBM for that data point. [sent-51, score-0.032]

40 Each RBM updates its weights based on all the data points associated with it. [sent-52, score-0.056]

41 Through various learning tasks on synthetic and real data, we show the convergence properties, quality of subspaces learnt, and improvement in the accuracies of both descriptive and predictive tasks. [sent-53, score-0.426]

42 Firstly, while we employ traditional second order (2layer) RBMs, [19] describes an implicit mixture of RBMs which is formulated using third order RBMs. [sent-56, score-0.083]

43 Authors in [19] introduce the cluster label (explicitly) as a hidden discrete variable in the RBM formulation describing an energy function that captures 3-way interactions among vis- ible units, hidden units, and the cluster label variable. [sent-57, score-0.403]

44 In our solution, the cluster label is implied by the RBM id, and the model parameters capture the usual 2-way interactions. [sent-58, score-0.085]

45 One reason for our choice of traditional RBMs as building blocks was the availability of a great deal of research on properly training RBMs [11]. [sent-59, score-0.076]

46 Secondly, the partition function of an RBM is intractable. [sent-60, score-0.04]

47 By introducing the third layer [19] manages to fit the mixture of boltzmann machines without explicitly computing the partition function. [sent-61, score-0.381]

48 We tackle the partition problem by associating samples with the RBMs that reconstruct them best (minimizing the reconstruction errors) in an EM algorithm. [sent-62, score-0.125]

49 Since the reconstruction error is not an inherent part of the traditional RBM formulation, our framework is not a mixture model. [sent-63, score-0.145]

50 Training RBMs RBMS are two layered, fully connected networks that have a layer of input/visible variables and a layer of hidden random variables. [sent-65, score-0.34]

51 RBMS model a distribution over visible variables by introducing a set of stochastic features. [sent-66, score-0.122]

52 In applications where RBMS are used for image analysis, the visible units correspond to the pixel values and the hidden units correspond to visual features. [sent-67, score-0.546]

53 There are three kinds of design choices in building an RBM: the objective function used, the frequency of parameter updates, and the type of visible and hidden units. [sent-68, score-0.263]

54 RBMS are usually trained by minimizing the contrastive divergence objective (CD- 1)[10] which approximates the actual RBM objective. [sent-69, score-0.036]

55 , I (v0 = 1 is the bias terms), J hidden units hj ,j = 1, . [sent-73, score-0.523]

56 =J0wijhj⎠⎞ (2) σ(·) is the sigmoid activation function. [sent-80, score-0.08]

57 In the CD- 1 forσw(a·r)d pass ( vsiigsimbloei dto a hctiidvdaetino)n, we acctitoivna. [sent-81, score-0.068]

58 te I nth teh hei CddDen-1 1u fnoirt-s hj+ from visible (input) unit activations vi+ (Eq. [sent-82, score-0.25]

59 In the backward pass (hidden to visible), we recompute visible unit activations vi− from hj+ (Eq. [sent-84, score-0.314]

60 Finally we compute the hidden unit activations hj− again from vi− . [sent-86, score-0.256]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rbms', 0.639), ('rbm', 0.418), ('subspaces', 0.344), ('hj', 0.19), ('units', 0.155), ('hidden', 0.139), ('boltzmann', 0.103), ('clustering', 0.099), ('visible', 0.097), ('subspace', 0.09), ('vi', 0.088), ('hyderabad', 0.086), ('projections', 0.084), ('activations', 0.083), ('layer', 0.077), ('deep', 0.07), ('machines', 0.069), ('bow', 0.063), ('domains', 0.058), ('raw', 0.056), ('coupled', 0.055), ('patches', 0.055), ('activation', 0.054), ('associating', 0.053), ('hypothesize', 0.05), ('traditional', 0.049), ('nonlinear', 0.048), ('networks', 0.047), ('belief', 0.046), ('convolutional', 0.045), ('representations', 0.044), ('really', 0.043), ('ible', 0.043), ('ckw', 0.043), ('weed', 0.043), ('cluster', 0.041), ('pass', 0.041), ('modelling', 0.04), ('dictate', 0.04), ('mere', 0.04), ('partition', 0.04), ('bias', 0.039), ('clusters', 0.038), ('discover', 0.036), ('complexity', 0.036), ('hei', 0.036), ('contrastive', 0.036), ('restricted', 0.036), ('semantically', 0.034), ('unit', 0.034), ('mixture', 0.034), ('learning', 0.033), ('learnt', 0.033), ('manages', 0.033), ('updates', 0.033), ('groupings', 0.032), ('ssc', 0.032), ('earch', 0.032), ('reconstruction', 0.032), ('pr', 0.03), ('recompute', 0.03), ('rating', 0.03), ('inherent', 0.03), ('backward', 0.029), ('projection', 0.028), ('simpler', 0.028), ('nature', 0.028), ('intend', 0.027), ('dto', 0.027), ('movies', 0.027), ('building', 0.027), ('pascal', 0.027), ('lie', 0.027), ('lies', 0.026), ('em', 0.026), ('sigmoid', 0.026), ('learns', 0.026), ('fundamentally', 0.026), ('assumptions', 0.026), ('rich', 0.026), ('layered', 0.025), ('predictive', 0.025), ('coupling', 0.025), ('finding', 0.025), ('introducing', 0.025), ('descriptive', 0.024), ('undirected', 0.024), ('speech', 0.024), ('stacked', 0.023), ('implied', 0.023), ('systematic', 0.023), ('evident', 0.023), ('weights', 0.023), ('wij', 0.022), ('level', 0.022), ('complex', 0.022), ('usual', 0.021), ('outperform', 0.021), ('deeper', 0.021), ('learn', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

Author: Siddhartha Chandra, Shailesh Kumar, C.V. Jawahar

Abstract: Understanding the nature of data is the key to building good representations. In domains such as natural images, the data comes from very complex distributions which are hard to capture. Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. Feature learning can thus be seen as a form of dimensionality reduction. In this paper, we describe a feature learning scheme for natural images. We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. Projections of the image patches into these subspaces gives us features, which we use to build image representations. Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

2 0.41879737 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning

Author: Yujia Li, Daniel Tarlow, Richard Zemel

Abstract: When modeling structured outputs such as image segmentations, prediction can be improved by accurately modeling structure present in the labels. A key challenge is developing tractable models that are able to capture complex high level structure like shape. In this work, we study the learning of a general class of pattern-like high order potential, which we call Compositional High Order Pattern Potentials (CHOPPs). We show that CHOPPs include the linear deviation pattern potentials of Rother et al. [26] and also Restricted Boltzmann Machines (RBMs); we also establish the near equivalence of these two models. Experimentally, we show that performance is affected significantly by the degree of variability present in the datasets, and we define a quantitative variability measure to aid in studying this. We then improve CHOPPs performance in high variability datasets with two primary contributions: (a) developing a loss-sensitive joint learning procedure, so that internal pattern parameters can be learned in conjunction with other model potentials to minimize expected loss;and (b) learning an image-dependent mapping that encourages or inhibits patterns depending on image features. We also explore varying how multiple patterns are composed, and learning convolutional patterns. Quantitative results on challenging highly variable datasets show that the joint learning and image-dependent high order potentials can improve performance.

3 0.40228665 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

Author: Roni Mittelman, Honglak Lee, Benjamin Kuipers, Silvio Savarese

Abstract: The use of semantic attributes in computer vision problems has been gaining increased popularity in recent years. Attributes provide an intermediate feature representation in between low-level features and the class categories, leading to improved learning on novel categories from few examples. However, a major caveat is that learning semantic attributes is a laborious task, requiring a significant amount of time and human intervention to provide labels. In order to address this issue, we propose a weakly supervised approach to learn mid-level features, where only class-level supervision is provided during training. We develop a novel extension of the restricted Boltzmann machine (RBM) by incorporating a Beta-Bernoulli process factor potential for hidden units. Unlike the standard RBM, our model uses the class labels to promote category-dependent sharing of learned features, which tends to improve the generalization performance. By using semantic attributes for which annotations are available, we show that we can find correspondences between the learned mid-level features and the labeled attributes. Therefore, the mid-level features have distinct semantic characterization which is similar to that given by the semantic attributes, even though their labeling was not provided during training. Our experimental results on object recognition tasks show significant performance gains, outperforming existing methods which rely on manually labeled semantic attributes.

4 0.2703653 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

Author: Andrew Kae, Kihyuk Sohn, Honglak Lee, Erik Learned-Miller

Abstract: Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., superpixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.

5 0.18709667 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

Author: Raghuraman Gopalan

Abstract: Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different locations, we propose joint learning of information that images across locations share and vary upon. Starting with generative and discriminative subspaces pertaining to domains, which are obtained by a hierarchical grouping of images from adjacent locations, we present a top-down approach that first models cross-domain information transfer by utilizing the geometry ofthese subspaces, and then encodes the model results onto individual images to infer their location. We report competitive results for location recognition and clustering on two public datasets, im2GPS and San Francisco, and empirically validate the utility of various design choices involved in the approach.

6 0.17606746 135 cvpr-2013-Discriminative Subspace Clustering

7 0.17061642 215 cvpr-2013-Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces

8 0.12891772 105 cvpr-2013-Deep Learning Shape Priors for Object Segmentation

9 0.1246195 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

10 0.10343623 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

11 0.10034758 379 cvpr-2013-Scalable Sparse Subspace Clustering

12 0.082008012 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

13 0.079862021 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

14 0.079666682 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

15 0.074676536 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

16 0.074111506 237 cvpr-2013-Kernel Learning for Extrinsic Classification of Manifold Features

17 0.067836173 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

18 0.058563914 32 cvpr-2013-Action Recognition by Hierarchical Sequence Summarization

19 0.058511585 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures

20 0.057104643 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, -0.036), (2, -0.042), (3, 0.033), (4, 0.065), (5, 0.019), (6, -0.035), (7, -0.023), (8, -0.014), (9, -0.093), (10, 0.064), (11, -0.039), (12, -0.104), (13, -0.064), (14, -0.046), (15, 0.271), (16, -0.163), (17, 0.123), (18, 0.052), (19, -0.019), (20, 0.228), (21, -0.334), (22, 0.099), (23, -0.167), (24, -0.133), (25, -0.175), (26, 0.087), (27, -0.126), (28, 0.052), (29, 0.027), (30, 0.048), (31, 0.163), (32, 0.007), (33, 0.053), (34, 0.162), (35, -0.005), (36, 0.068), (37, -0.04), (38, -0.052), (39, 0.017), (40, 0.075), (41, -0.036), (42, -0.021), (43, -0.059), (44, 0.046), (45, -0.014), (46, 0.067), (47, -0.016), (48, -0.007), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94628686 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

Author: Siddhartha Chandra, Shailesh Kumar, C.V. Jawahar

Abstract: Understanding the nature of data is the key to building good representations. In domains such as natural images, the data comes from very complex distributions which are hard to capture. Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. Feature learning can thus be seen as a form of dimensionality reduction. In this paper, we describe a feature learning scheme for natural images. We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. Projections of the image patches into these subspaces gives us features, which we use to build image representations. Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

2 0.66354251 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

Author: Roni Mittelman, Honglak Lee, Benjamin Kuipers, Silvio Savarese

Abstract: The use of semantic attributes in computer vision problems has been gaining increased popularity in recent years. Attributes provide an intermediate feature representation in between low-level features and the class categories, leading to improved learning on novel categories from few examples. However, a major caveat is that learning semantic attributes is a laborious task, requiring a significant amount of time and human intervention to provide labels. In order to address this issue, we propose a weakly supervised approach to learn mid-level features, where only class-level supervision is provided during training. We develop a novel extension of the restricted Boltzmann machine (RBM) by incorporating a Beta-Bernoulli process factor potential for hidden units. Unlike the standard RBM, our model uses the class labels to promote category-dependent sharing of learned features, which tends to improve the generalization performance. By using semantic attributes for which annotations are available, we show that we can find correspondences between the learned mid-level features and the labeled attributes. Therefore, the mid-level features have distinct semantic characterization which is similar to that given by the semantic attributes, even though their labeling was not provided during training. Our experimental results on object recognition tasks show significant performance gains, outperforming existing methods which rely on manually labeled semantic attributes.

3 0.64953613 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning

Author: Yujia Li, Daniel Tarlow, Richard Zemel

Abstract: When modeling structured outputs such as image segmentations, prediction can be improved by accurately modeling structure present in the labels. A key challenge is developing tractable models that are able to capture complex high level structure like shape. In this work, we study the learning of a general class of pattern-like high order potential, which we call Compositional High Order Pattern Potentials (CHOPPs). We show that CHOPPs include the linear deviation pattern potentials of Rother et al. [26] and also Restricted Boltzmann Machines (RBMs); we also establish the near equivalence of these two models. Experimentally, we show that performance is affected significantly by the degree of variability present in the datasets, and we define a quantitative variability measure to aid in studying this. We then improve CHOPPs performance in high variability datasets with two primary contributions: (a) developing a loss-sensitive joint learning procedure, so that internal pattern parameters can be learned in conjunction with other model potentials to minimize expected loss;and (b) learning an image-dependent mapping that encourages or inhibits patterns depending on image features. We also explore varying how multiple patterns are composed, and learning convolutional patterns. Quantitative results on challenging highly variable datasets show that the joint learning and image-dependent high order potentials can improve performance.

4 0.60872144 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

Author: Andrew Kae, Kihyuk Sohn, Honglak Lee, Erik Learned-Miller

Abstract: Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., superpixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.

5 0.47638425 135 cvpr-2013-Discriminative Subspace Clustering

Author: Vasileios Zografos, Liam Ellis, Rudolf Mester

Abstract: We present a novel method for clustering data drawn from a union of arbitrary dimensional subspaces, called Discriminative Subspace Clustering (DiSC). DiSC solves the subspace clustering problem by using a quadratic classifier trained from unlabeled data (clustering by classification). We generate labels by exploiting the locality of points from the same subspace and a basic affinity criterion. A number of classifiers are then diversely trained from different partitions of the data, and their results are combined together in an ensemble, in order to obtain the final clustering result. We have tested our method with 4 challenging datasets and compared against 8 state-of-the-art methods from literature. Our results show that DiSC is a very strong performer in both accuracy and robustness, and also of low computational complexity.

6 0.46340677 105 cvpr-2013-Deep Learning Shape Priors for Object Segmentation

7 0.44383654 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

8 0.42895043 215 cvpr-2013-Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces

9 0.41679266 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement

10 0.40244198 32 cvpr-2013-Action Recognition by Hierarchical Sequence Summarization

11 0.36489967 379 cvpr-2013-Scalable Sparse Subspace Clustering

12 0.34417245 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

13 0.33215553 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

14 0.30692768 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

15 0.29206103 42 cvpr-2013-Analytic Bilinear Appearance Subspace Construction for Modeling Image Irradiance under Natural Illumination and Non-Lambertian Reflectance

16 0.27983338 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit

17 0.26860237 218 cvpr-2013-Improving the Visual Comprehension of Point Sets

18 0.25977814 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

19 0.25778353 134 cvpr-2013-Discriminative Sub-categorization

20 0.25323024 191 cvpr-2013-Graph-Laplacian PCA: Closed-Form Solution and Robustness


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.218), (16, 0.011), (26, 0.041), (28, 0.019), (33, 0.259), (67, 0.06), (69, 0.11), (78, 0.137), (87, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90411413 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler

Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.

2 0.89919358 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation

Author: Dohyung Seo, Jeffrey Ho, Baba C. Vemuri

Abstract: In this paper, we introduce a novel framework for computing a path of diffeomorphisms between a pair of input diffeomorphisms. Direct computation of a geodesic path on the space of diffeomorphisms Diff(Ω) is difficult, and it can be attributed mainly to the infinite dimensionality of Diff(Ω). Our proposed framework, to some degree, bypasses this difficulty using the quotient map of Diff(Ω) to the quotient space Diff(M)/Diff(M)μ obtained by quotienting out the subgroup of volume-preserving diffeomorphisms Diff(M)μ. This quotient space was recently identified as the unit sphere in a Hilbert space in mathematics literature, a space with well-known geometric properties. Our framework leverages this recent result by computing the diffeomorphic path in two stages. First, we project the given diffeomorphism pair onto this sphere and then compute the geodesic path between these projected points. Sec- ond, we lift the geodesic on the sphere back to the space of diffeomerphisms, by solving a quadratic programming problem with bilinear constraints using the augmented Lagrangian technique with penalty terms. In this way, we can estimate the path of diffeomorphisms, first, staying in the space of diffeomorphisms, and second, preserving shapes/volumes in the deformed images along the path as much as possible. We have applied our framework to interpolate intermediate frames of frame-sub-sampled video sequences. In the reported experiments, our approach compares favorably with the popular Large Deformation Diffeomorphic Metric Mapping framework (LDDMM).

3 0.89849585 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter

Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.

same-paper 4 0.89846092 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

Author: Siddhartha Chandra, Shailesh Kumar, C.V. Jawahar

Abstract: Understanding the nature of data is the key to building good representations. In domains such as natural images, the data comes from very complex distributions which are hard to capture. Feature learning intends to discover or best approximate these underlying distributions and use their knowledge to weed out irrelevant information, preserving most of the relevant information. Feature learning can thus be seen as a form of dimensionality reduction. In this paper, we describe a feature learning scheme for natural images. We hypothesize that image patches do not all come from the same distribution, they lie in multiple nonlinear subspaces. We propose a framework that uses K Restricted Boltzmann Machines (K-RBMS) to learn multiple non-linear subspaces in the raw image space. Projections of the image patches into these subspaces gives us features, which we use to build image representations. Our algorithm solves the coupled problem of finding the right non-linear subspaces in the input space and associating image patches with those subspaces in an iterative EM like algorithm to minimize the overall reconstruction error. Extensive empirical results over several popular image classification datasets show that representations based on our framework outperform the traditional feature representations such as the SIFT based Bag-of-Words (BoW) and convolutional deep belief networks.

5 0.89735615 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors

Author: Peter Kontschieder, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi

Abstract: Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both stateof-the-art forest models and the conventional pairwise CRF.

6 0.89524049 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

7 0.89433956 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

8 0.89376366 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?

9 0.89242899 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning

10 0.89140081 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition

11 0.89085126 414 cvpr-2013-Structure Preserving Object Tracking

12 0.88891828 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

13 0.8879059 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

14 0.88786626 314 cvpr-2013-Online Object Tracking: A Benchmark

15 0.88703138 198 cvpr-2013-Handling Noise in Single Image Deblurring Using Directional Filters

16 0.88369149 307 cvpr-2013-Non-uniform Motion Deblurring for Bilayer Scenes

17 0.8833279 131 cvpr-2013-Discriminative Non-blind Deblurring

18 0.88326931 44 cvpr-2013-Area Preserving Brain Mapping

19 0.88210022 295 cvpr-2013-Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior

20 0.88122034 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation