jmlr jmlr2011 jmlr2011-31 knowledge-graph by maker-knowledge-mining

31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

Source: pdf

Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg

Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to signiﬁcantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, most popular visual descriptors are histograms of image measurements. [sent-8, score-0.604]

2 In this paper, we demonstrate that HIK can be used in an unsupervised manner to signiﬁcantly improve the generation of visual codebooks. [sent-10, score-0.386]

3 We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. [sent-11, score-0.395]

4 The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. [sent-12, score-1.084]

5 In addition, we propose a one-class SVM formulation to create more effective visual code words. [sent-15, score-0.409]

6 Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. [sent-16, score-0.963]

7 Keywords: visual codebook, additive kernel, histogram intersection kernel 1. [sent-17, score-0.942]

8 Introduction Bag of visual words (BOV) is currently a popular approach to object and scene recognition in computer vision. [sent-18, score-0.616]

9 Two typical visual descriptors are SIFT by Lowe (2004) and HOG by Dalal and Triggs (2005). [sent-23, score-0.393]

10 • Generate a codebook and map features to visual code words. [sent-31, score-0.769]

11 A visual codebook is a method that divides the space of visual descriptors into several regions. [sent-32, score-1.033]

12 Features in one region correspond to the same visual code word, which is represented by an integer between 1 and the size of the codebook. [sent-33, score-0.409]

13 An image is then encoded as a histogram of visual code words. [sent-34, score-0.778]

14 The quality of the visual codebook has a signiﬁcant impact on the success of BOV-based methods. [sent-38, score-0.64]

15 Popular and successful methods for object and scene categorization typically employ unsupervised learning methods (for example, k-means clustering or Gaussian Mixture Model) to obtain a visual codebook. [sent-39, score-0.546]

16 First, most of the popular visual descriptors are based on histograms of various image measurements such as spatial gradients, optical ﬂow, or color. [sent-43, score-0.63]

17 In this paper we demonstrate that HIK and other additive kernels can be used to generate better visual codebooks with unsupervised learning, in comparison to the popular k-means clustering method. [sent-50, score-0.863]

18 Speciﬁcally, this paper makes four contributions: First, we show that HIK generates better codebooks and thus improves recognition accuracy. [sent-53, score-0.426]

19 (2008), such that the generation and application of HIK codebook has the same theoretical complexity as the standard k-means. [sent-55, score-0.414]

20 Second, we show that all additive kernels can be used to efﬁciently generate visual codebooks. [sent-58, score-0.483]

21 Section 3 introduces the histogram intersection kernel and various other additive kernels, and presents a general kernel k-means based method for visual codebook generation. [sent-70, score-1.354]

22 In Section 4 we propose methods to efﬁciently generate visual codebooks for HIK and other additive kernels. [sent-71, score-0.793]

23 Related Works In this section we will brieﬂy review two categories of related works: different kernels for comparing histograms, and various visual codebook generation methods. [sent-74, score-0.764]

24 The main point of this paper is that when histogram features are employed, the histogram intersection kernel or another additive kernel should be used to compare them. [sent-75, score-1.057]

25 (2008) in two ways: First, we demonstrate that the speedup of HIK can be extended to codebook generation (and unsupervised learning in general). [sent-85, score-0.441]

26 On the visual codebook side, k-means is the most widely used method for visual codebook generation (Sivic and Zisserman, 2003). [sent-87, score-1.36]

27 3099 W U , TAN AND R EHG method was presented in Tuytelaars and Schmid (2007), which divided the space of visual descriptors into regular lattice instead of learning a division of the space from training data. [sent-94, score-0.393]

28 In this work, we propose a new alternative to k-means, based on the histogram intersection kernel and other additive kernels. [sent-100, score-0.636]

29 In k-means based methods, a visual code word is usually represented by the cluster center (that is, the average of all features that belong to this code word), which is simple and fast to compute. [sent-101, score-0.621]

30 It was discovered that assigning a feature to multiple code words (which is also termed as softassignment) may improve the codebook quality (Philbin et al. [sent-102, score-0.491]

31 Another interesting representation is the hyperfeature in Agarwal and Triggs (2008), which considers the mapped code word indexes as a type of image feature and repeatedly generates new codebooks and code words into a hierarchy. [sent-109, score-0.728]

32 Nist´ r e and Stew´ nius (2006) used a tree structure to divide the space of visual descriptors hierarchically e and Moosmann et al. [sent-111, score-0.393]

33 Both methods map visual features to code words much faster than k-means. [sent-113, score-0.459]

34 (2008) uniﬁed the codebook generation step with the classiﬁer learning step seamlessly. [sent-116, score-0.414]

35 , xd ) ∈ Rd be a histogram of non-negative real values with d histogram bins, where + R+ is the set of non-negative real numbers. [sent-129, score-0.634]

36 x could represent an image (for example, a histogram of visual code words in the bag of visual words model) or an image patch (for example, a SIFT visual descriptor). [sent-130, score-1.514]

37 The histogram intersection kernel κHI is deﬁned as follows (Swain and Ballard, 1991): κHI (x1 , x2 ) = d ∑ min(x1, j , x2, j ) . [sent-131, score-0.521]

38 2 Additive Kernels Algorithm 1 is not restricted to work only with the histogram intersection kernel. [sent-150, score-0.443]

39 It is obvious that the histogram intersection kernel is an instance of the additive kernels. [sent-157, score-0.636]

40 If g(·) is a non-negative and non-decreasing function, then the generalized histogram intersection kernel, κ(x1 , x2 ) = d ∑ g(min(x1, j , x2, j )) , j=1 is a valid additive kernel. [sent-159, score-0.558]

41 + 2: {The output is a function that maps a histogram to its visual code word index, w1 (x∗ ) : Rd → + {1, . [sent-172, score-0.788]

42 5: repeat 6: For every input histogram xi , ﬁnd the visual code word that xi belongs to, and denote the index of this visual code word as li : li ← arg min φ(xi ) − m j 2 , 1 ≤ i ≤ n. [sent-193, score-1.259]

43 1≤ j≤m For every visual code word mi , ﬁnd the set of the input histograms that belong to this visual code word, and denote this set as πi : 7: πi = { j|l j = i, 1 ≤ j ≤ n}, 1 ≤ i ≤ m. [sent-194, score-1.08]

44 8: For every visual code word mi , update it to be the centroid of input histograms that belong to this visual code word: ∑ j∈πi φ(x j ) mi ← , 1 ≤ i ≤ m. [sent-195, score-1.121]

45 11: Output: For any histogram x∗ ∈ Rd , its corresponding visual code word index is: + 10: w1 (x∗ ) = arg min φ(x∗ ) − mi 2 . [sent-198, score-0.829]

46 3 K-median Codebook Generation Although k-means (or equivalently, using the ℓ2 distance) is the most popular codebook generation method, the histogram intersection kernel has a closer connection to the ℓ1 distance. [sent-205, score-0.935]

47 An online k-median algorithm has been used by Larlus and Jurie to create visual codebooks in the Pascal challenge (Everingham et al. [sent-216, score-0.66]

48 In Section 5, we empirically compare visual codebooks generated by the k-median algorithm to those generated by both the k-means algorithm and the proposed additive kernel k-means method. [sent-218, score-0.853]

49 In this section, we ﬁrst propose an efﬁcient kernel k-means algorithm for the histogram intersection kernel, and then generalize the algorithm to all additive kernels. [sent-222, score-0.636]

50 Thus, we need to compute this term only once for every visual code word in each iteration. [sent-231, score-0.471]

51 For example, if we use the histogram intersection kernel and compute this term literally using Equation 1, the complexity is O(|πi |d). [sent-234, score-0.521]

52 (2008) proposed fast methods to compute Equation 4 for the histogram intersection kernel to improve the testing speed of HIK SVM classiﬁers, which achieved an exact answer of Equation 4 in O(d log2 |π|) steps and an approximate answer in O(d) steps. [sent-247, score-0.521]

53 A histogram of visual code word indexes has the property that every histogram component is a non-negative integer, that is, it is a vector in Nd . [sent-249, score-1.105]

54 Similarly, a visual descriptor histogram can usually be transformed into the space Nd . [sent-250, score-0.652]

55 The naive implementation does not need additional storage during the visual codebook generation step. [sent-265, score-0.744]

56 4 One Class SVM Codebook Representation A codebook generated by the k-means algorithm ﬁrst divides the space Rd into m regions, and then represents each code word (or, region) by the centroid of the examples (histogram, feature vectors, etc. [sent-275, score-0.529]

57 For example, if we model regions as Gaussian distributions with distinct covariance matrices, the generation of codebooks and mapping from visual features to code words will require much more storage and computational resources than we can afford. [sent-282, score-0.893]

58 o The one-class SVM summarizes the distribution of histograms inside a visual code word. [sent-294, score-0.568]

59 We believe that this compact hypersphere better summarizes a visual code word. [sent-297, score-0.409]

60 We propose Algorithm 4 to use one-class SVM to generate visual code words. [sent-300, score-0.427]

61 In other words, a histogram is considered as belonging to the i-th visual word if it is inside the sphere (in the feature space Φ) centered at mi with radius ri . [sent-319, score-0.756]

62 A sphere in Φ is different from a usual k-means sphere because it respects the similarity measure κ, and its radius ri automatically adapts to the distribution of histograms in a visual word. [sent-320, score-0.465]

63 In each train/test splitting, a visual codebook is generated using the training images, and both training and testing images are transformed into histograms of code words. [sent-329, score-0.902]

64 The proposed algorithms can efﬁciently process a huge number of histogram features, for example, approximately 200k to 320k histograms are clustered across the ﬁrst three data sets in less than 6 minute. [sent-331, score-0.476]

65 We use two types of visual descriptors: SIFT for Caltech 101, CENTRIST (CENsus TRansform hISTogram, refer to Wu and Rehg (2011) for more details) for the scene, event, and indoor data sets. [sent-334, score-0.396]

66 ) ¯ The ﬁrst step is to use visual descriptors from the training images to form a visual codebook, in which we use m = 200 to generate 200 visual code words. [sent-336, score-1.126]

67 Thus an image or image sub-window is represented by a histogram of code words in the speciﬁed image region. [sent-338, score-0.6]

68 Since we are classifying histograms, we modiﬁed both LIBSVM and BSVM so that they are able to use the histogram intersection kernel. [sent-348, score-0.443]

69 Mean / standard deviation values and paired t-tests are used to show the beneﬁt of histogram kernel codebooks (Algorithm 1), while the Wilcoxon test is used for evaluating the one-class SVM code word generation method (Algorithm 4). [sent-369, score-0.994]

70 κHI and κLIN means that a histogram intersection or a linear kernel visual codebook is used, respectively. [sent-372, score-1.161]

71 HIK codebooks also have advantages over k-median codebooks in most cases. [sent-383, score-0.708]

72 One-class SVM improves histogram intersection kernel code words (Algorithm 4). [sent-388, score-0.648]

73 In summary, using HIK codebooks and one-class SVM together generated the best results in almost all cases (best results are shown in boldface within each column of Table 2). [sent-394, score-0.354]

74 As shown in Table 2, HIK codebooks outperformed k-median codebooks in most cases. [sent-404, score-0.708]

75 3 Experimental Results for Additive Kernels Experiments with codebooks generated using the other two additive kernels (χ2 and exponential HIK) are shown in Table 4. [sent-409, score-0.513]

76 For ease of comparison, results of HIK (without one-class SVM) codebooks are also shown in Table 4. [sent-410, score-0.354]

77 However, all three additive kernel based codebooks generally have higher accuracies than the standard k-means codebook generation method. [sent-412, score-0.996]

78 Since the time complexity of additive kernel based codebooks is the same as that of the k-means method, it is advantageous to apply such kernels in generating visual codebooks. [sent-413, score-0.897]

79 For example, the χ2 kernel in some cases leads to higher accuracies than the histogram intersection kernel. [sent-414, score-0.556]

80 There is not an obvious kernel for the ℓ1 distance, so we did not use one-class SVM for codebooks generated by k-median. [sent-418, score-0.432]

81 3112 V ISUAL C ODEBOOK G ENERATION U SING A DDITIVE K ERNELS κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 67. [sent-419, score-0.44]

82 70% (a) Caltech 101, 15 train, 20 test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 84. [sent-449, score-0.44]

83 58% (b) 15 class scene, 100 train, rest test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 84. [sent-479, score-0.44]

84 76% (c) 8 class sports, 70 train, 60 test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 43. [sent-509, score-0.44]

85 72% (d) 67 class indoor, 80 train, 20 test Table 2: Results of HIK, k-median and k-means codebooks and one-class SVM code words. [sent-539, score-0.457]

86 Computation time Codebook storage size HIK 2 v ¯ k-median 2 1 k-means 1 1 Table 3: Comparison of three codebook generation methods. [sent-541, score-0.414]

87 HIK codebooks are used, with ocsvm , ¬B and s = 8. [sent-635, score-0.574]

88 As shown in Table 6, if we use SIFT for scene recognition and CENTRIST for object recognition, the recognition accuracies are reduced. [sent-672, score-0.393]

89 The proposed method is signiﬁcantly better than standard k-means codebooks with more visual code words. [sent-682, score-0.763]

90 Scene recognition requires different type of features (CENTRIST instead of SIFT) and less information (performance almost stabilized when step size is 8 and codebook size is 200). [sent-685, score-0.432]

91 Conclusion In this article, we show that when the histogram intersection kernel is used as the similarity measure in clustering visual descriptors that are histograms, the generated visual codebooks produce better code words and as a consequence, improve the bag of visual words model. [sent-688, score-2.081]

92 We propose a HIK based codebook generation method which runs almost as fast as k-means and has consistently higher accuracies than k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. [sent-689, score-1.089]

93 As an alternative to k-means, in which cluster centroids are used to represent code words, we proposed a one-class SVM formulation to generate better visual code words. [sent-690, score-0.551]

94 We also generalize the proposed visual codebook generation method to arbitrary additive kernels. [sent-691, score-0.835]

95 Although k-median is rarely used to generate codebooks, we empirically evaluated k-median codebooks and recommend it as a compromise between the proposed method and k-means. [sent-694, score-0.396]

96 K3115 W U , TAN AND R EHG median codebooks have lower accuracies than HIK codebooks but usually have higher accuracy than k-means codebooks. [sent-695, score-0.761]

97 The PASCAL visual object classes challenge 2006 (VOC 2006) results, 2006. [sent-740, score-0.358]

98 Learning generative visual models from few training example: an incremental Bayesian approach tested on 101 object categories. [sent-742, score-0.358]

99 Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. [sent-877, score-1.103]

100 Unifying discriminative visual codebook generation with classiﬁer training for object category recognition. [sent-891, score-0.772]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hik', 0.431), ('codebooks', 0.354), ('codebook', 0.334), ('histogram', 0.317), ('visual', 0.306), ('ocsvm', 0.22), ('scene', 0.162), ('histograms', 0.159), ('hi', 0.134), ('intersection', 0.126), ('additive', 0.115), ('caltech', 0.106), ('code', 0.103), ('maji', 0.096), ('indoor', 0.09), ('svm', 0.088), ('descriptors', 0.087), ('dditive', 0.086), ('ehg', 0.086), ('isual', 0.086), ('odebook', 0.086), ('rehg', 0.086), ('sift', 0.081), ('generation', 0.08), ('kernel', 0.078), ('centrist', 0.077), ('recognition', 0.072), ('vision', 0.071), ('eneration', 0.066), ('word', 0.062), ('sports', 0.06), ('ernels', 0.06), ('jianxin', 0.057), ('wu', 0.056), ('tan', 0.053), ('object', 0.052), ('image', 0.052), ('sing', 0.05), ('ehi', 0.048), ('lin', 0.046), ('kernels', 0.044), ('zisserman', 0.044), ('lazebnik', 0.041), ('mi', 0.041), ('libsvm', 0.04), ('bsvm', 0.04), ('bov', 0.038), ('rd', 0.037), ('vedaldi', 0.037), ('accuracies', 0.035), ('triggs', 0.033), ('feature', 0.03), ('descriptor', 0.029), ('antonio', 0.029), ('nmd', 0.029), ('odone', 0.029), ('sobel', 0.029), ('swain', 0.029), ('berg', 0.027), ('speedup', 0.027), ('features', 0.026), ('clustering', 0.026), ('spatial', 0.026), ('james', 0.025), ('bill', 0.024), ('dalal', 0.024), ('densely', 0.024), ('jurie', 0.024), ('sivic', 0.024), ('bag', 0.024), ('naive', 0.024), ('words', 0.024), ('compromise', 0.024), ('distance', 0.023), ('ci', 0.023), ('gao', 0.022), ('accelerate', 0.022), ('cluster', 0.021), ('train', 0.019), ('ballard', 0.019), ('boiman', 0.019), ('boughorbel', 0.019), ('cordelia', 0.019), ('frederic', 0.019), ('gemert', 0.019), ('hog', 0.019), ('libhik', 0.019), ('moosmann', 0.019), ('philbin', 0.019), ('stew', 0.019), ('subhransu', 0.019), ('svetlana', 0.019), ('tuytelaars', 0.019), ('recognizing', 0.019), ('classi', 0.019), ('gradients', 0.019), ('euclidean', 0.019), ('accuracy', 0.018), ('generate', 0.018), ('ieee', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg

2 0.070056267 101 jmlr-2011-Variable Sparsity Kernel Learning

Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman

Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efﬁcient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efﬁciency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aﬂalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN

3 0.045771465 55 jmlr-2011-Learning Multi-modal Similarity

Author: Brian McFee, Gert Lanckriet

Abstract: In many applications involving multi-media data, the deﬁnition of similarity between items is integral to several key tasks, including nearest-neighbor retrieval, classiﬁcation, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, uniﬁed similarity space. Our algorithm learns an optimal ensemble of kernel transformations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multimedia similarity, we develop graph-based techniques to ﬁlter similarity measurements, resulting in a simpliﬁed and robust training procedure. Keywords: multiple kernel learning, metric learning, similarity

4 0.04353033 66 jmlr-2011-Multiple Kernel Learning Algorithms

Author: Mehmet Gönen, Ethem Alpaydın

Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels. Keywords: support vector machines, kernel machines, multiple kernel learning

5 0.04345401 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt

Abstract: In this article, we propose a family of efﬁcient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be deﬁned based on this Weisfeiler-Lehman sequence of graphs, including a highly efﬁcient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classiﬁcation benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classiﬁcation, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT

6 0.043260984 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

7 0.038026966 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods

8 0.036915813 98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures

9 0.036677845 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

10 0.036253642 58 jmlr-2011-Learning from Partial Labels

11 0.035054453 105 jmlr-2011-lp-Norm Multiple Kernel Learning

12 0.03313458 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

13 0.032974675 61 jmlr-2011-Logistic Stick-Breaking Process

14 0.032324269 48 jmlr-2011-Kernel Analysis of Deep Networks

15 0.029932898 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

16 0.027501648 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

17 0.027203318 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

18 0.026611034 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

19 0.023371406 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

20 0.023331176 16 jmlr-2011-Clustering Algorithms for Chains

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, -0.064), (2, 0.053), (3, -0.128), (4, -0.013), (5, 0.006), (6, -0.006), (7, 0.001), (8, -0.035), (9, -0.067), (10, -0.014), (11, 0.029), (12, 0.03), (13, -0.017), (14, -0.064), (15, -0.061), (16, -0.009), (17, -0.018), (18, -0.021), (19, -0.076), (20, -0.026), (21, 0.007), (22, -0.095), (23, 0.049), (24, -0.062), (25, -0.005), (26, -0.115), (27, 0.123), (28, -0.176), (29, -0.119), (30, 0.002), (31, -0.257), (32, 0.08), (33, -0.114), (34, 0.146), (35, 0.273), (36, -0.111), (37, -0.065), (38, -0.197), (39, 0.219), (40, -0.096), (41, -0.131), (42, 0.213), (43, 0.139), (44, -0.003), (45, -0.225), (46, 0.17), (47, 0.048), (48, -0.057), (49, -0.107)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95734602 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg

2 0.32587889 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

Author: Paramveer S. Dhillon, Dean Foster, Lyle H. Ungar

Abstract: We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MICG ROUP) and multi-task feature selection (MIC-M ULTI). MIC-G ROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-M ULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, T RANS F EAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems.1 Keywords: feature selection, minimum description length principle, multi-task learning

3 0.31488168 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt

4 0.30945244 58 jmlr-2011-Learning from Partial Labels

Author: Timothee Cour, Ben Sapp, Ben Taskar

Abstract: We address the problem of partially-labeled multiclass classiﬁcation, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classiﬁer that can disambiguate the partiallylabeled training instances, and generalize to unseen data. We deﬁne an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost. Keywords: weakly supervised learning, multiclass classiﬁcation, convex learning, generalization bounds, names and faces

5 0.27964967 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efﬁcient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation

6 0.25802574 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

7 0.24927698 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

8 0.20211282 98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures

9 0.20030805 66 jmlr-2011-Multiple Kernel Learning Algorithms

10 0.19851746 55 jmlr-2011-Learning Multi-modal Similarity

11 0.19286706 101 jmlr-2011-Variable Sparsity Kernel Learning

12 0.1907997 83 jmlr-2011-Scikit-learn: Machine Learning in Python

13 0.18412198 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

14 0.17807548 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods

15 0.17536648 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

16 0.16870846 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

17 0.15783481 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

18 0.15724744 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

19 0.15328395 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

20 0.15202205 105 jmlr-2011-lp-Norm Multiple Kernel Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.023), (9, 0.016), (10, 0.027), (24, 0.044), (31, 0.103), (32, 0.013), (41, 0.014), (60, 0.022), (71, 0.529), (73, 0.042), (78, 0.031), (90, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84140903 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg

2 0.75195348 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

Author: Cassio P. de Campos, Qiang Ji

Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the beneﬁts of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique

3 0.2845583 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar

Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve signiﬁcant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.

4 0.27253011 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

Author: Amarnag Subramanya, Jeff Bilmes

Abstract: We describe a new objective for graph-based semi-supervised learning based on minimizing the Kullback-Leibler divergence between discrete probability measures that encode class membership probabilities. We show how the proposed objective can be efﬁciently optimized using alternating minimization. We prove that the alternating minimization procedure converges to the correct optimum and derive a simple test for convergence. In addition, we show how this approach can be scaled to solve the semi-supervised learning problem on very large data sets, for example, in one instance we use a data set with over 108 samples. In this context, we propose a graph node ordering algorithm that is also applicable to other graph-based semi-supervised learning approaches. We compare the proposed approach against other standard semi-supervised learning algorithms on the semi-supervised learning benchmark data sets (Chapelle et al., 2007), and other real-world tasks such as text classiﬁcation on Reuters and WebKB, speech phone classiﬁcation on TIMIT and Switchboard, and linguistic dialog-act tagging on Dihana and Switchboard. In each case, the proposed approach outperforms the state-of-the-art. Lastly, we show that our objective can be generalized into a form that includes the standard squared-error loss, and we prove a geometric rate of convergence in that case. Keywords: graph-based semi-supervised learning, transductive inference, large-scale semi-supervised learning, non-parametric models

5 0.27143189 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis

Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We ﬁrst propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efﬁcient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line ﬁlling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture

6 0.26747945 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

7 0.26714769 48 jmlr-2011-Kernel Analysis of Deep Networks

8 0.26676193 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

9 0.26579887 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

10 0.2643964 95 jmlr-2011-Training SVMs Without Offset

11 0.26155162 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

12 0.25847036 16 jmlr-2011-Clustering Algorithms for Chains

13 0.25794801 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

14 0.25762045 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

15 0.25445139 101 jmlr-2011-Variable Sparsity Kernel Learning

16 0.25210327 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

17 0.25100926 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data

18 0.24837716 66 jmlr-2011-Multiple Kernel Learning Algorithms

19 0.24752928 102 jmlr-2011-Waffles: A Machine Learning Toolkit

20 0.24582909 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals