nips nips2009 nips2009-104 knowledge-graph by maker-knowledge-mining

104 nips-2009-Group Sparse Coding


Source: pdf

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Bag-of-words document representations are often used in text, image and video processing. [sent-5, score-0.258]

2 While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. [sent-6, score-1.29]

3 The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. [sent-7, score-1.652]

4 More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. [sent-8, score-0.936]

5 While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. [sent-9, score-0.339]

6 In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. [sent-10, score-0.295]

7 This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. [sent-11, score-0.871]

8 Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. [sent-12, score-1.074]

9 Those representations abstract from spatial and temporal order to encode a document as a vector of the numbers of occurrences in the document of descriptors from a suitable dictionary. [sent-14, score-0.555]

10 For text documents, the dictionary might consist of all the words or of all the n-grams of a certain minimum frequency in the document collection [1]. [sent-15, score-0.686]

11 For images or videos, however, there is no simple mapping from the raw document to descriptor counts. [sent-16, score-0.293]

12 Instead, visual descriptors must be first extracted and then represented in terms of a carefully constructed dictionary. [sent-17, score-0.403]

13 For dictionary construction, the standard approach in computer vision is to use some unsupervised vector quantization (VQ) technique, often k-means clustering [14], to create the dictionary. [sent-19, score-0.628]

14 A new image is then represented by a vector indexed by dictionary elements (codewords), which for element d counts the number of visual descriptors in the image whose closest codeword is d. [sent-20, score-1.426]

15 VQ 1 representations are maximally sparse per descriptor occurrence since they pick a single codeword for each occurrence, but they may not be sparse for the image as a whole; furthermore, such representations are not that robust with respect to descriptor variability. [sent-21, score-0.913]

16 Sparse representations have obvious computational benefits, by saving both processing time in handling visual descriptors and space in storing encoded images. [sent-22, score-0.495]

17 These techniques promote sparsity in determining a small set of codewords from the dictionary that can be used to efficiently represent each visual descriptor of each image [13]. [sent-24, score-1.144]

18 However, those approaches consider each visual descriptor in the image as a separate coding problem and do not take into account the fact that descriptor coding is just an intermediate step in creating a bag of codewords representation for the whole image. [sent-25, score-1.038]

19 Thus, sparse coding of each visual descriptor does not guarantee sparse coding of the whole image. [sent-26, score-0.714]

20 In this study, we propose and evaluate the mixed-norm regularizers [12, 10, 2] to take into account the structure of bags of visual descriptors present in images. [sent-28, score-0.431]

21 This form of regularization thus promotes the use of a small subset of codewords for each category that could be different from category to category, thus including an indirect discriminative signal in code construction. [sent-32, score-0.382]

22 Mixed regularization can be applied at two levels: for image encoding, which can be expressed as a convex optimization problem, and for dictionary learning, using an alternating minimization procedure. [sent-33, score-0.867]

23 Dictionary regularization promotes a small dictionary size directly, instead of indirectly through the sparse encoding step. [sent-34, score-0.897]

24 4 extends the technique to learn the dictionary by alternating optimization. [sent-40, score-0.596]

25 5 presents experimental results on a well-known image database. [sent-42, score-0.171]

26 Our main goal is to encode effectively groups of instances in terms of a set of dictionary codewords |D| D = {dj }j=1 . [sent-48, score-0.809]

27 For example, if instances are image patches, each group may be the set of patches in a particular image, and each codeword may represent some kind of average patch. [sent-49, score-0.45]

28 Given D and G, our first subgoal, encoding, is to minimize a tradeoff between the reconstruction error for G in terms of D, and a suitable mixed norm for the matrix of reconstruction weights that express each xi as a positive linear combination of dj ∈ D. [sent-52, score-0.631]

29 The tradeoff between accurate reconstruction or compact encoding is governed through a regularization parameter λ. [sent-53, score-0.424]

30 Our second subgoal, learning, is to estimate a good dictionary D given a set of training groups n {Gm }m=1 . [sent-54, score-0.601]

31 3 Group Coding To encode jointly all the instances in a group G with dictionary D, we solve the following convex optimization problem: A⋆ = arg minA Q(A, G, D) Q(A, G, D) i αj where and = 1 i∈G xi − 2 ≥ 0 ∀i, j . [sent-56, score-0.797]

32 |D| j=1 i αj dj 2 +λ |D| j=1 αj p |D| (1) |G| 1 The reconstruction matrix A = {αj }j=1 consists of non-negative vectors αj = (αj , . [sent-57, score-0.299]

33 The second term of the objective weighs the mixed ℓ1 /ℓp norm of A, which measures reconstruction complexity, with the regularization parameter λ that balances reconstruction quality (the first term) and reconstruction complexity. [sent-61, score-0.727]

34 Its partial derivatives with respect to each αr are ∂ ˜ Q= i ∂αr i i αj (dj · dr ) − xi · dr + αr dr 2 . [sent-67, score-1.329]

35 (3) j=r Let us make the following abbreviation for a given index r, i αj (dj · dr ) . [sent-68, score-0.436]

36 µi = xi · dr − (4) j=r i It is clear that if µi ≤ 0 then the optimum for αr is zero. [sent-69, score-0.493]

37 For p = 1 the objective function is separable and we get the following sub-gradient condition for optimality, i 0 ∈ −µ+ + αr dr i 2 +λ µ+ − [0, λ] ∂ i i |αr | ⇒ αr ∈ i . [sent-72, score-0.484]

38 i ∂αr dr 2 (5) ∈[0,1] r r Since αi ≥ 0 the above subgradient condition for optimality implies that αi = 0 when µ+ ≤ λ and i + r 2 otherwise αi = (µi − λ)/ dr . [sent-73, score-0.872]

39 In this case, the gradient of Q(αr ) with an ℓ2 regularization term is αr dr 2 αr − µ+ + λ . [sent-78, score-0.536]

40 αr 3 At the optimum this vector must be zero, so after rearranging terms we obtain αr = dr 2 + −1 λ αr µ+ . [sent-79, score-0.472]

41 (6) solely as a function of the scaling parameter s −1 λ µ+ , s µ+ = dr 2 + s µ+ which implies that 1 λ s= . [sent-82, score-0.436]

42 (7) 1− dr 2 µ+ We now revisit the assumption that the norm of the optimal solution is greater than zero. [sent-83, score-0.479]

43 Once the optimal group reconstruction matrix A is found, we compress the matrix into a single vector. [sent-87, score-0.247]

44 Since visual descriptors and dictionary elements are only accessed through inner products in the above method, it could be easily generalized to work with Mercer kernels instead. [sent-91, score-0.992]

45 4 Dictionary Learning Now that we know how to achieve optimal reconstruction for a given dictionary, we examine how to learn a good dictionary, that is, a dictionary that balances between reconstruction error, reconstruction complexity, overall complexity relative to the given training set. [sent-92, score-1.098]

46 In particular, we seek a learning method that facilitates both induction of new dictionary words and the removal of dictionary words with low predictive power. [sent-93, score-1.248]

47 To achieve this goal, we will apply ℓ1 /ℓ2 regularization controlled by a new hyperparameter γ, to dictionary words. [sent-94, score-0.7]

48 For this approach to work, we assume that instances have been mean-subtracted so that the zero vector 0 is the (uninformative) mean of the data and regularization towards 0 is equivalent to removing words that do not contribute much to compact representation of groups. [sent-95, score-0.243]

49 , An } the corresponding reconstruction coefficients relative to dictionary D. [sent-102, score-0.729]

50 In our application we set p = 2 as the norm penalty of the dictionary words. [sent-107, score-0.613]

51 Moreover, the same coordinate descent technique described above for finding the optimum reconstruction weights can be used again here after simple algebraic manipulations. [sent-109, score-0.195]

52 m i (9) i Then, we can express dr compactly as follows. [sent-111, score-0.436]

53 Calculating the gradient with respect to each dr and equating it to zero, we obtain   dr i i i i  αm,j αm,r dj + (αm,r )2 dr − αm,r xm,i  + γ =0 . [sent-113, score-1.448]

54 dr m i∈Gm j=r 4 Swapping the sums over m and i with the sum over j, using the auxiliary variables, and noting that dj does not depend neither on m nor on i, we obtain νj,r dj + νr,r dr − v r + γ j=r dr =0 . [sent-114, score-1.588]

55 dr Similarly to the way we solved for αr , we now define the vector ur = v r − following iterate for dr : γ −1 ur , dr = νr,r 1 − ur + (10) j=r νj,r dj to get the (11) where, as above, we incorporated the case dr = 0, by applying the operator [·]+ to the term 1 − γ/ ur . [sent-115, score-2.152]

56 The form of the solution implies that we can eliminate dr , as it becomes 0, whenever the norm of the residual vector ur is smaller than γ. [sent-116, score-0.546]

57 Thus, the dictionary learning procedure naturally facilitates the ability to remove dictionary words whose predictive power falls below the regularization parameter. [sent-117, score-1.305]

58 5 Experimental Setting We compare our approach to image coding with previous sparse coding methods by measuring their impact on classification performance on the PASCAL VOC (Visual Object Classes) 2007 dataset [4]. [sent-118, score-0.539]

59 In the past, many features have been used for VOC classification, with bag-of-words histograms of local descriptors like SIFT [6] being most popular. [sent-122, score-0.312]

60 In our experiments, we extract local descriptors based on a regular grid for each image. [sent-123, score-0.312]

61 The grid points are located at every seventh pixel horizontally and vertically, which produces an average of 3234 descriptors per image. [sent-124, score-0.369]

62 We used a custom local descriptor that collects Gabor wavelet responses at different orientations, spatial scales, and spatial offsets from the interest point. [sent-125, score-0.249]

63 The 27 (scale, offset) pairs were chosen by optimizing a previous image recognition task, unrelated to this paper, using a genetic algorithm. [sent-127, score-0.196]

64 [15] independently described a descriptor that similarly uses responses at different orientations, scales, and offsets (see their Figure 2). [sent-129, score-0.207]

65 Overall, this descriptor is generally comparable to SIFT and results in similar performance. [sent-130, score-0.175]

66 To build an image feature vector from the descriptors, we thus investigate the following methods: 1. [sent-131, score-0.171]

67 Build a bag-of-words histogram over hierarchical k-means codewords by looking up each descriptor in a hierarchical k-means tree [11]. [sent-132, score-0.362]

68 Jointly train a dictionary and encode each descriptor using an ℓ1 sparse coding approach with γ = 0, which was studied previously [5, 7, 9]. [sent-136, score-1.03]

69 Jointly train a dictionary and encode sets of descriptors where each set corresponds to a single image, using ℓ1 /ℓ2 group sparse coding, varying both γ and λ. [sent-138, score-1.137]

70 Jointly train a dictionary and encode sets of descriptors where each set corresponds to all descriptors or all images of a single class, using ℓ1 /ℓ2 sparse coding, varying both γ and λ. [sent-140, score-1.428]

71 Then, use ℓ1 /ℓ2 sparse coding to encode the descriptors in individual images and obtain a single α vector per image. [sent-141, score-0.691]

72 As explained before, we normalized all descriptors to have zero mean so that regularizing dictionary words towards the zero vector implies dictionary sparsity. [sent-142, score-1.516]

73 In all cases, the initial dictionary used during training was obtained from the same hierarchical kmeans tree, with a branching factor of 10 and depth 4 rather than 3 as used in the baseline method. [sent-143, score-0.635]

74 This scheme yielded an initial dictionary of size 7873. [sent-144, score-0.57]

75 15 500 1000 1500 2000 Dictionary Size Figure 1: Mean Average Precision on the 2007 PASCAL VOC database as a function of the size of the dictionary obtained by both ℓ1 and ℓ1 /ℓ2 regularization approaches when varying λ or γ. [sent-152, score-0.696]

76 We show results where descriptors are grouped either by image or by class. [sent-153, score-0.518]

77 To evaluate the impact of different coding methods on an important end-to-end task, image classification, we selected the VOC 2007 training set for classifier training, the VOC 2007 validation set for hyperparameter selection, and the VOC 2007 test set for for evaluation. [sent-155, score-0.345]

78 Class average precisions on the encoded test set are then averaged across the 20 classes to produce the mean average precision shown in our graphs. [sent-157, score-0.237]

79 6 Results and Discussion In Figure 1 we compare the mean average precisions of the competing approaches as encoding hyperparameters are varied to control the overall dictionary size. [sent-158, score-0.778]

80 For the ℓ1 approach, achieving different dictionary size was obtained by tuning λ while setting γ = 0. [sent-159, score-0.57]

81 As it can be seen in Figure 1, when the dictionary is allowed to be very large, the pure ℓ1 approach yields the best performance. [sent-162, score-0.607]

82 On the other hand, when the size of the dictionary matters, then all the approaches based on ℓ1 /ℓ2 regularization performed better than the ℓ1 counterpart. [sent-163, score-0.67]

83 The version of ℓ1 /ℓ2 in which we allowed γ to vary provided the best tradeoff between dictionary size and classification performance when descriptors were grouped per image, which was to be expected as γ directly promotes sparse dictionaries. [sent-165, score-1.229]

84 More interestingly, when grouping descriptors per class instead of per image, we get even better performance for small dictionary sizes by varying λ. [sent-166, score-1.004]

85 In Figure 2 we compare the mean average precisions of ℓ1 and ℓ1 /ℓ2 regularization as average image size varies. [sent-167, score-0.405]

86 When image size is constrained, which is often the case is large-scale applications, all 6 Mean Average Precision vs Average Image Size 0. [sent-168, score-0.192]

87 15 500 1000 1500 2000 Average Image Size Figure 2: Mean Average Precision on the 2007 PASCAL VOC database as a function of the average size of each image as encoded using the trained dictionary obtained by both ℓ1 and ℓ1 /ℓ2 regularization approaches when varying λ and γ. [sent-175, score-0.953]

88 We show results where descriptors are grouped either by image or by class. [sent-176, score-0.518]

89 200 200 400 400 600 800 600 1000 800 1200 1400 1000 1600 1200 1800 2000 1400 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 Figure 3: Comparison of the dictionary words used to reconstruct the same image. [sent-178, score-0.613]

90 A pure ℓ1 coding was used on the left, while a mixed ℓ1 /ℓ2 encoding was used on the right plot. [sent-179, score-0.313]

91 Each row represents the number of times each dictionary word was used in the reconstruction of the image. [sent-180, score-0.765]

92 Once again ℓ1 regularization performed even worse than hierarchical k-means for small image sizes Figure 3 compares the usage of dictionary words to encode the same image, either using ℓ1 (on the left) or ℓ1 /ℓ2 (on the right) regularization. [sent-182, score-0.98]

93 Each graph shows the number of times a dictionary word (a row in the plot) was used in the reconstruction of the image. [sent-183, score-0.765]

94 Clearly, ℓ1 regularization yields an overall sparser representation in terms of total number of dictionary coefficients that are used. [sent-184, score-0.715]

95 However, almost all of the resulting dictionary vectors are non-zero and used at least once in the coding process. [sent-185, score-0.714]

96 As expected, with ℓ1 /ℓ2 regularization, a dictionary word is either always used or never used yielding a much more compact representation in terms of the total number of dictionary words that are used. [sent-186, score-1.292]

97 7 Overall, mixed-norm regularization yields better performance when the problem to solve includes resource constraints, either time (a smaller dictionary yields faster image encoding) or space (one can store or convey more images when they take less space). [sent-187, score-0.908]

98 Finally, grouping descriptors per class instead of per image during dictionary learning promotes the use of the same dictionary words for all images of the same class, hence yielding some form of weak discrimination which appears to help under space or time constraints. [sent-189, score-1.92]

99 Discriminative sparse image models for class-specific edge detection and image interpretation. [sent-251, score-0.422]

100 Linear spatial pyramid matching using sparse coding for image classification. [sent-299, score-0.416]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dictionary', 0.57), ('dr', 0.436), ('descriptors', 0.312), ('descriptor', 0.175), ('image', 0.171), ('voc', 0.16), ('reconstruction', 0.159), ('coding', 0.144), ('dj', 0.14), ('codewords', 0.117), ('codeword', 0.111), ('regularization', 0.1), ('gm', 0.097), ('visual', 0.091), ('group', 0.088), ('vary', 0.081), ('encoding', 0.08), ('sparse', 0.08), ('images', 0.067), ('ur', 0.067), ('promotes', 0.067), ('encode', 0.061), ('mairal', 0.058), ('vq', 0.058), ('tradeoff', 0.057), ('encoded', 0.056), ('precisions', 0.053), ('mountain', 0.052), ('mixed', 0.052), ('document', 0.051), ('pascal', 0.049), ('precision', 0.047), ('hkmeans', 0.044), ('strelow', 0.044), ('tola', 0.044), ('words', 0.043), ('norm', 0.043), ('category', 0.039), ('subgoal', 0.039), ('google', 0.038), ('pure', 0.037), ('optimum', 0.036), ('representations', 0.036), ('word', 0.036), ('hierarchical', 0.035), ('grouped', 0.035), ('orientations', 0.034), ('vision', 0.033), ('offsets', 0.032), ('groups', 0.031), ('average', 0.03), ('hyperparameter', 0.03), ('duchi', 0.03), ('branching', 0.03), ('instances', 0.03), ('transaction', 0.029), ('compact', 0.028), ('objective', 0.028), ('bags', 0.028), ('per', 0.027), ('balances', 0.027), ('jointly', 0.027), ('varying', 0.026), ('alternating', 0.026), ('recognition', 0.025), ('videos', 0.025), ('classi', 0.025), ('ca', 0.025), ('documents', 0.025), ('objectives', 0.025), ('quantization', 0.025), ('overall', 0.024), ('yielding', 0.024), ('elad', 0.024), ('offset', 0.023), ('sift', 0.023), ('pereira', 0.023), ('occurrences', 0.023), ('text', 0.022), ('occurrence', 0.022), ('grouping', 0.022), ('facilitates', 0.022), ('cvpr', 0.022), ('bach', 0.021), ('norms', 0.021), ('spatial', 0.021), ('representation', 0.021), ('vs', 0.021), ('mean', 0.021), ('xi', 0.021), ('th', 0.021), ('iccv', 0.02), ('represent', 0.02), ('class', 0.02), ('separable', 0.02), ('discriminative', 0.02), ('negahban', 0.019), ('samy', 0.019), ('swapping', 0.019), ('accessed', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

2 0.26893124 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley

Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1

3 0.20696828 67 nips-2009-Directed Regression

Author: Yi-hao Kao, Benjamin V. Roy, Xiang Yan

Abstract: When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative. We also develop a theory that motivates the algorithm. 1

4 0.20123905 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

5 0.13837533 236 nips-2009-Structured output regression for detection with partial truncation

Author: Andrea Vedaldi, Andrew Zisserman

Abstract: We develop a structured output model for object category detection that explicitly accounts for alignment, multiple aspects and partial truncation in both training and inference. The model is formulated as large margin learning with latent variables and slack rescaling, and both training and inference are computationally efficient. We make the following contributions: (i) we note that extending the Structured Output Regression formulation of Blaschko and Lampert [1] to include a bias term significantly improves performance; (ii) that alignment (to account for small rotations and anisotropic scalings) can be included as a latent variable and efficiently determined and implemented; (iii) that the latent variable extends to multiple aspects (e.g. left facing, right facing, front) with the same formulation; and (iv), most significantly for performance, that truncated and truncated instances can be included in both training and inference with an explicit truncation mask. We demonstrate the method by training and testing on the PASCAL VOC 2007 data set – training includes the truncated examples, and in testing object instances are detected at multiple scales, alignments, and with significant truncations. 1

6 0.13798127 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

7 0.12867709 137 nips-2009-Learning transport operators for image manifolds

8 0.11907933 190 nips-2009-Polynomial Semantic Indexing

9 0.1133047 211 nips-2009-Segmenting Scenes by Matching Image Composites

10 0.10796107 157 nips-2009-Multi-Label Prediction via Compressed Sensing

11 0.089517914 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

12 0.089184508 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

13 0.083800487 163 nips-2009-Neurometric function analysis of population codes

14 0.08311817 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

15 0.079896197 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

16 0.079625838 164 nips-2009-No evidence for active sparsification in the visual cortex

17 0.077786945 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

18 0.076691464 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

19 0.076547489 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

20 0.07565368 133 nips-2009-Learning models of object structure


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.21), (1, -0.121), (2, -0.119), (3, 0.024), (4, -0.004), (5, 0.083), (6, -0.017), (7, 0.0), (8, 0.255), (9, 0.2), (10, -0.018), (11, 0.039), (12, -0.048), (13, 0.093), (14, -0.079), (15, 0.051), (16, -0.045), (17, 0.15), (18, 0.067), (19, -0.052), (20, -0.044), (21, -0.065), (22, -0.032), (23, 0.177), (24, 0.103), (25, 0.167), (26, 0.021), (27, 0.042), (28, -0.007), (29, 0.131), (30, 0.237), (31, -0.171), (32, -0.047), (33, -0.062), (34, -0.087), (35, 0.02), (36, -0.011), (37, -0.006), (38, -0.021), (39, -0.118), (40, 0.034), (41, 0.03), (42, -0.067), (43, 0.04), (44, 0.096), (45, 0.048), (46, 0.034), (47, 0.104), (48, 0.094), (49, -0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96210426 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

2 0.74654067 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley

Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1

3 0.59215796 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

4 0.57293773 67 nips-2009-Directed Regression

Author: Yi-hao Kao, Benjamin V. Roy, Xiang Yan

Abstract: When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative. We also develop a theory that motivates the algorithm. 1

5 0.53543437 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

Author: Kai Yu, Tong Zhang, Yihong Gong

Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1

6 0.52008754 137 nips-2009-Learning transport operators for image manifolds

7 0.49995917 138 nips-2009-Learning with Compressible Priors

8 0.45660415 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

9 0.44320899 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

10 0.43680567 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

11 0.4162446 164 nips-2009-No evidence for active sparsification in the visual cortex

12 0.3842572 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

13 0.38188878 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

14 0.3746295 211 nips-2009-Segmenting Scenes by Matching Image Composites

15 0.37327084 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

16 0.36706147 157 nips-2009-Multi-Label Prediction via Compressed Sensing

17 0.36661521 236 nips-2009-Structured output regression for detection with partial truncation

18 0.36035013 190 nips-2009-Polynomial Semantic Indexing

19 0.35143429 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models

20 0.34931087 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.013), (24, 0.048), (25, 0.061), (27, 0.122), (35, 0.083), (36, 0.131), (39, 0.048), (58, 0.147), (61, 0.013), (71, 0.041), (81, 0.029), (86, 0.153), (91, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93498111 104 nips-2009-Group Sparse Coding

Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow

Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

2 0.89648652 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

3 0.88491774 137 nips-2009-Learning transport operators for image manifolds

Author: Benjamin Culpepper, Bruno A. Olshausen

Abstract: We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. 1

4 0.88007808 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton Hengel

Abstract: The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. B OOST M ETRIC is instead based on a key observation that any positive semidefinite matrix can be decomposed into a linear positive combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting method is easy to implement, does not require tuning, and can accommodate various types of constraints. Experiments on various datasets show that the proposed algorithm compares favorably to those state-of-the-art methods in terms of classification accuracy and running time. 1

5 0.87607288 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

6 0.87606221 119 nips-2009-Kernel Methods for Deep Learning

7 0.87368673 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

8 0.87336385 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

9 0.87298185 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

10 0.87248057 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

11 0.87235981 70 nips-2009-Discriminative Network Models of Schizophrenia

12 0.87064964 3 nips-2009-AUC optimization and the two-sample problem

13 0.86917323 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

14 0.86882257 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

15 0.86855412 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

16 0.86760706 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

17 0.86617321 100 nips-2009-Gaussian process regression with Student-t likelihood

18 0.86494434 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

19 0.86264569 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

20 0.86237687 87 nips-2009-Exponential Family Graph Matching and Ranking