jmlr jmlr2007 jmlr2007-84 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kristen Grauman, Trevor Darrell
Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition
Reference: text
sentIndex sentText sentNum sentScore
1 We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. [sent-8, score-1.406]
2 The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. [sent-9, score-1.707]
3 We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. [sent-10, score-1.21]
4 G RAUMAN AND DARRELL Figure 1: The pyramid match intersects histogram pyramids formed over sets of features, approximating the optimal correspondences between the sets’ features. [sent-21, score-1.297]
5 For example, vectors describing the appearance or shape within local image patches can be used to form a feature set for each image; the pyramid match approximates the similarity according to a partial matching in that feature space. [sent-22, score-1.423]
6 In this work we present the pyramid match kernel—a new kernel function over unordered feature sets that allows them to be used effectively and efficiently in kernel-based learning methods. [sent-26, score-1.093]
7 The histogram pyramids are then compared using a weighted histogram intersection computation, which we show defines an implicit correspondence based on the finest resolution histogram cell where a matched pair first appears (see Figure 1). [sent-28, score-0.789]
8 The similarity measured by the pyramid match approximates the similarity measured by the optimal correspondences between feature sets of unequal cardinality (i. [sent-29, score-1.131]
9 We also provide theoretical approximation bounds for the pyramid match cost relative to the optimal partial matching cost. [sent-36, score-1.21]
10 In our experiments we find that the pyramid match outperforms the bag-of-words approach for object category recognition on a challenging data set (see Section 9). [sent-121, score-1.132]
11 We show how the pyramid match kernel may be used in conjunction with existing kernel-based learning algorithms to successfully learn decision boundaries or real-valued functions from the multi-set representation. [sent-142, score-1.034]
12 Ours is the first work to show the histogram pyramid’s connection to the optimal partial matching when used with a hierarchical weighted histogram intersection similarity measure. [sent-143, score-0.725]
13 We first introduced the pyramid match kernel for the purpose of discriminative object recognition in a recent conference paper (Grauman and Darrell, 2005). [sent-148, score-1.222]
14 The basic idea of our method is to map sets of features to multi-resolution histograms, and then compare the histograms with a weighted histogram intersection measure in order to approximate the similarity of the best partial matching between the feature sets. [sent-151, score-0.725]
15 We call the proposed matching kernel the pyramid match kernel because input sets are converted to multi-resolution histograms. [sent-152, score-1.333]
16 2 The Pyramid Match Algorithm The pyramid match approximation uses a multi-dimensional, multi-resolution histogram pyramid to partition the feature space into increasingly larger regions. [sent-177, score-1.796]
17 The pyramid allows us to extract a matching score without computing distances between any of the points in the input sets—when points sharing a bin are counted as matched, the size of that bin indicates the farthest distance any two points in it could be from one another. [sent-180, score-1.156]
18 The histogram pyramids are then compared using a weighted histogram intersection computation, which we show defines an implicit partial correspondence based on the finest resolution histogram cell where a matched pair first appears. [sent-182, score-0.84]
19 2 The pyramid match P∆ measures similarity (or dissimilarity) between point sets based on implicit correspondences found within this multi-resolution histogram space. [sent-190, score-1.256]
20 The matching approximation implicitly finds correspondences between point sets, if we consider two points matched once they fall into the same histogram bin, starting at the finest resolution level where each unique point is guaranteed to be in its own bin. [sent-193, score-0.611]
21 To enable accurate pyramid matching even with high-dimensional feature spaces, we have developed a variant where the pyramid bin structure depends on the distribution of the data (Grauman and Darrell, 2007). [sent-197, score-1.682]
22 ) The sum over Ni , weighted by wi = 1, 1 , 1 , gives the 2 4 pyramid match similarity. [sent-209, score-0.947]
23 To calculate Ni , the pyramid match makes use of a histogram intersection function I , which measures the “overlap” between two histograms’ bin counts: I (A, B) = r ∑ min A( j) , B( j) , j=1 where A and B are histograms with r bins, and A( j) denotes the count of the jth bin of A. [sent-212, score-1.516]
24 The number of new matches found at each level in the pyramid is weighted according to the size of that histogram’s bins: to measure similarity, matches made within larger bins are weighted less than those found in smaller bins. [sent-224, score-0.842]
25 3 Moreover, as we show in Section 6, this particular setting of the weights enables us to prove theoretical error bounds for the pyramid match cost. [sent-229, score-0.947]
26 3 and 4, we define the (un-normalized) pyramid match: ˜ P∆ (Ψ(Y), Ψ(Z)) = L−1 ∑ wi i=0 I (Hi (Y), Hi (Z)) − I (Hi−1 (Y), Hi−1 (Z)) , (5) where Y, Z ∈ S, and Hi (Y) and Hi (Z) refer to the ith histogram in Ψ(Y) and Ψ(Z), respectively. [sent-231, score-0.849]
27 In order to alleviate quantization effects that may arise due to the discrete histogram bins, we combine the kernel values resulting from multiple (T ) pyramid matches formed under different multi-resolution histograms with randomly shifted bins. [sent-233, score-1.101]
28 j=1 To further refine a pyramid match, multiple pyramids with unique initial (finest level) bin sizes may be used. [sent-244, score-0.885]
29 The set of unique bin sizes produced throughout a pyramid determines the range 3. [sent-245, score-0.795]
30 This is if the pyramid match is a kernel measuring similarity; an inverse weighting scheme is applied if the pyramid match is used to measure cost. [sent-246, score-1.981]
31 Then the set of distances considered with f = 1 are {1, 2, 4, 8, 16, 32}; adding a second pyramid with f = 3 increases this set to also consider the distances {3, 6, 12, 24}. [sent-253, score-0.733]
32 3 Partial Match Correspondences The pyramid match allows input sets to have unequal cardinalities, and therefore it enables partial matchings, where the points of the smaller set are mapped to some subset of the points in the larger set. [sent-257, score-0.998]
33 Since the pyramid match defines correspondences across entire sets simultaneously, it inherently accounts for the distribution of features occurring in one set. [sent-261, score-1.117]
34 Instead, we exploit the fact that the points’ placement in the pyramid reflects their distance from one another in the feature space. [sent-267, score-0.675]
35 The time required to compute the L-level histogram pyramid Ψ(X) for an input set with m = |X| d-dimensional features is O(dzL), where z = max(m, k) and k is the maximum histogram index value D in a single dimension. [sent-268, score-1.107]
36 The histogram pyramid that results is high-dimensional, but very sparse, with only O(mL) nonzero entries that need to be stored. [sent-274, score-0.849]
37 Generating multiple pyramid matches with randomly shifted grids scales the complexity by T , the constant number of shifts. [sent-280, score-0.721]
38 Proposition 1 The pyramid match yields a Mercer kernel. [sent-293, score-0.947]
39 Using this proof and the closure properties of valid kernel functions, we can show that the pyramid match is a Mercer kernel. [sent-318, score-1.034]
40 Given that Mercer kernels are closed under both addition and scaling by a positive constant (Shawe-Taylor and Cristianini, 2004), the above form shows that the pyramid match kernel is positive semi-definite for any weighting scheme where w i ≥ wi+1 . [sent-321, score-1.069]
41 In other words, if we can insure that the weights decrease for coarser pyramid levels, then the pyramid match will sum over positively weighted positive-definite kernels, yielding another positive1 definite kernel. [sent-322, score-1.622]
42 The sum combining the outputs from multiple pyramid matches under different random bin translations likewise remains Mercer. [sent-324, score-0.841]
43 Therefore, the pyramid match is valid for use as a kernel in any existing learning methods that require Mercer kernels. [sent-325, score-1.034]
44 Approximation Error Bounds In this section we show theoretical approximation bounds on the cost measured by the pyramid match relative to the cost measured by the optimal partial matching defined in Section 3. [sent-327, score-1.21]
45 738 T HE P YRAMID M ATCH K ERNEL In terms of the directed distance, the pyramid match cost is P∆ (X, Y) = L−1 ∑ d2i i=0 D (Hi−1 (X), Hi−1 (Y)) − D (Hi (X), Hi (Y)) L−2 = d D (H−1 (X), H−1 (Y)) + ∑ d2i D (Hi (X), Hi (Y)) i=0 L−2 = d|X| + ∑ d2i D (Hi (X), Hi (Y)) . [sent-348, score-0.947]
46 i=0 The expected value of the pyramid match cost is then E [P∆ (X, Y)] = d|X| + L−2 ∑ d2i E [D (Hi (X), Hi (Y))] . [sent-349, score-0.947]
47 2 j 2 Relating this to the expected value of the pyramid match cost in Eqn. [sent-377, score-0.947]
48 9, we have the following bound on the expected pyramid match cost error E [P∆ (X, Y)] ≤ d C (M (X, Y; π∗ )) + 2d log D C (M (X, Y; π∗ )) ≤ 2d log D + d C (M (X, Y; π∗ )) ≤ C · d log D + d C (M (X, Y; π∗ )). [sent-385, score-0.947]
49 Classification and Regression with the Pyramid Match We train support vector machines and support vector regressors (SVRs) to perform classification and regression with the pyramid match kernel. [sent-387, score-0.947]
50 Because the pyramid match kernel is positive-definite we are guaranteed to find a unique optimal solution. [sent-390, score-1.034]
51 We have found that the pyramid match kernel can produce kernel matrices with dominant diagonals, particularly as the dimension of the features in the sets increases. [sent-391, score-1.205]
52 The reason for this is that as the dimension of the points increases, there are a greater number of finer-resolution histogram levels in the pyramid where two input sets will have few shared bins. [sent-392, score-0.849]
53 In this section, we empirically evaluate the approximation quality of the pyramid match, and make a direct comparison between our partial matching approximation and the L 1 embedding of Indyk and Thaper (2003). [sent-410, score-0.938]
54 We conducted experiments to evaluate how close the correspondences implicitly measured by the pyramid match are to the true optimal correspondences—the matching that results in the minimal summed cost between corresponding points. [sent-411, score-1.245]
55 We compared the pyramid match’s outputs to those produced by the optimal partial matching obtained via a linear programming solution to the transportation problem (Rubner et al. [sent-421, score-0.938]
56 The two plots in the lefthand column show the normalized output costs from 10,000 pairwise set-to-set comparisons computed by the optimal matching (black), the pyramid match with the number of random shifts T = 3 (red circles), and the L1 approximation, also with T = 3 (green x’s). [sent-429, score-1.159]
57 Note that in these figures we plot cost (so the pyramid match weights are set to w i = d2i ), and for display purposes the costs have been normalized by the maximum cost produced for each measure to produce values between [0, 1]. [sent-430, score-0.976]
58 This is an expected outcome, since the L1 distance over histograms is directly related to histogram intersection if those histograms have equal masses (Swain and Ballard, 1991), as they do for the bijective matching test: 1 2 I (H(Y), H(Z)) = m − ||H(Y) − H(Z)||L1 , if m = |Y| = |Z|. [sent-437, score-0.664]
59 69) Optimal 2000 4000 6000 Approximation ranks 8000 10000 (b) Partial matching Figure 4: Pyramid match and L1 embedding comparison on (a) bijective matchings with equallysized sets and (b) partial matchings with variably-sized sets. [sent-462, score-0.65]
60 When all input sets are the same size, the pyramid match and the L1 embedding give equivalent results. [sent-463, score-0.947]
61 However, the pyramid match also approximates the optimal correspond ences for input sets of unequal cardinalities and allows partial matches. [sent-464, score-1.025]
62 This is a good context in which to test the pyramid match kernel, since such local features have no inherent ordering, and it is expected that the number of features will vary across examples. [sent-478, score-1.141]
63 Given two sets of local image features, the pyramid match kernel (PMK) value reflects how well the image parts match under a one-to-one correspondence. [sent-479, score-1.464]
64 All pyramid match run-times reported below include the time needed to compute both the pyramids and the weighted intersections, and were measured on 2. [sent-482, score-1.037]
65 Thus the pyramid match kernel (PMK) performs comparably to the others at their best for this data set, but is much more efficient than those tested above, requiring time only linear in the number of features. [sent-497, score-1.034]
66 The plots in Figure 6 depict the run-time versus recognition accuracy of the pyramid match kernel as compared to the match kernel (Wallraven et al. [sent-500, score-1.466]
67 Computing a kernel matrix for the same data with the two kernels yields nearly identical accuracy (left plot), but the pyramid match is significantly faster (center plot). [sent-510, score-1.069]
68 Allowing the same amount of training time for both methods, the pyramid match produces much better recognition results (right plot). [sent-511, score-1.02]
69 ) Pyramid match kernel (PMK) 75 100 74 80 50 100 150 200 Mean number of features per set 250 0 200 400 600 800 Training time (s) Figure 6: Both the pyramid match kernel and the match kernel of Wallraven et al. [sent-523, score-1.865]
70 However, while the time required to learn object categories with the quadratic-time match kernel grows quickly with the input size, the time required by the linear-time pyramid match remains very efficient (center plot). [sent-526, score-1.389]
71 Allowing the same run-time, the pyramid match kernel produces better recognition rates than the match kernel (right plot). [sent-527, score-1.466]
72 For this data set, the pyramid match operated on sets of SIFT features projected to d = 10 dimensions using PCA, with each appearance descriptor concatenated with its corresponding positional feature (image position normalized by image dimensions). [sent-532, score-1.129]
73 Since the pyramid match seeks a strong correspondence with some subset of the images’ features, it explicitly accounts for unsegmented, cluttered data. [sent-536, score-0.977]
74 For each training set size, the mean and standard deviation of the pyramid match accuracy over 10 runs is shown, where for each run we randomly select the training examples and use all the remaining database images as test examples. [sent-548, score-1.003]
75 Using only one training example per class, the pyramid match achieves an average recognition rate per class of 18%. [sent-550, score-1.078]
76 Using 15 examples (a standard point of comparison), the pyramid match achieves an average recognition rate per class of 50%. [sent-552, score-1.049]
77 Experiments comparing the recognition accuracy of the pyramid match kernel to an optimal partial matching kernel reveal that very little loss in accuracy is traded for speedup factors of several orders of magnitude (Grauman, 2006, , pages 118-121). [sent-553, score-1.457]
78 Over all the parameter configurations we tested, the best performance we could achieve with the bag-of-words on this data set is substantially poorer than that of the pyramid match, as seen in Figure 8 (a). [sent-554, score-0.675]
79 This indicates the difficulty of establishing an optimal flat quantization of the feature space for recognition, and also suggests that the partial matching ability of the pyramid match is beneficial for tolerating outlier features without skewing the matching cost. [sent-556, score-1.535]
80 From this comparison we can see that even with its extreme computational efficiency, the pyramid match kernel achieves results that are competitive with the state-of-the-art. [sent-567, score-1.034]
81 These results are based on a special case of the pyramid match, where multiple pyramid match kernels computed on spatial features are summed over a number of quantized appearance feature channels. [sent-573, score-1.798]
82 Plot (a) shows the mean and standard deviation for the recognition accuracy using the pyramid match kernel (red solid line) and a bag-of-words baseline (blue dotted line) when given varying numbers of training examples per class, over 10 runs with randomly selected training examples. [sent-611, score-1.136]
83 Learning a Function over Sets of Features In the following experiments we demonstrate the pyramid match applied to two regression problems: time of publication inference from a collection of research papers, and articulated pose estimation from monocular silhouettes. [sent-620, score-1.045]
84 Each document is then a bag of local semantic features, and two documents are compared with the partial matching implied by the pyramid match kernel, that is, in terms of how well (some subset) of the LSI term-space projections can be put into correspondence. [sent-645, score-1.291]
85 Boxplots (left) compare errors produced by three methods with a support vector regressor: bag-of-words (BOW) and latent semantic document-vectors (LSI DOC) with linear kernels, and “bag of word meanings” with the pyramid match kernel (BOWM PMK). [sent-653, score-1.115]
86 The plot on the right shows the true targets and corresponding predictions made by the pyramid match method (BOWM PMK). [sent-655, score-0.976]
87 2 Inferring 3-D Pose from Shape Features In this set of experiments, we use regression with the pyramid match kernel to directly learn the mapping between monocular silhouette images of humans and the corresponding articulated 3D body pose. [sent-679, score-1.138]
88 The boxplots compare the error distributions for the pyramid match kernel (PMK) and an RBF kernel over prototype frequency vectors (VQ-RBF). [sent-705, score-1.121]
89 The multi-resolution histograms used by the pyramid match contained ten resolution levels, as determined by the diameter of the features in the training examples, and each contained on average 2644 non-empty bins. [sent-715, score-1.168]
90 For each dimension of the pose targets, we trained an ε-insensitive SVR using the pyramid match kernel matrix between the training examples. [sent-718, score-1.096]
91 753 G RAUMAN AND DARRELL th th th bh n ls rs re le re bh n bh n rs ls ls rs re le le rw rw rt lt lw lk ltoe la ra la ra ltoe rtoe rt lk rk rk lk rw lw lt lw lt rt rtoe rk la ra ltoe rtoe Figure 12: Pose inference on real images. [sent-720, score-1.041]
92 99% confidence, the pyramid match yields average errors that are smaller than those of VQ-RBF by amounts between 4. [sent-741, score-0.947]
93 This experiment demonstrates the pyramid match kernel’s robustness to superfluous features in an input. [sent-744, score-1.031]
94 In contrast, the pyramid match’s partial matching rewards similarity only for the features placed into correspondence, and ignores extraneous clutter features. [sent-747, score-1.144]
95 The pyramid match kernel approximates the optimal partial matching by computing a weighted intersection over multi-resolution histograms, and it requires time only linear in the number of features. [sent-754, score-1.362]
96 Source code for the pyramid match kernel is available at http://people. [sent-757, score-1.034]
97 Recently we have developed a method to generate data-dependent pyramid partitions, which yield more accurate matching results in high-dimensional feature spaces (Grauman and Darrell, 2007). [sent-761, score-0.887]
98 In ongoing work we are considering alternative weighting schemes on the bins for the pyramid match, and developing methods for sub-linear time approximate similarity search over partial correspondences. [sent-762, score-0.823]
99 The pyramid match kernel: Discriminative classification with sets of image features. [sent-899, score-1.013]
100 Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. [sent-981, score-0.947]
wordName wordTfidf (topN-words)
[('pyramid', 0.675), ('match', 0.272), ('matching', 0.212), ('histogram', 0.174), ('hi', 0.164), ('darrell', 0.147), ('pmk', 0.138), ('grauman', 0.126), ('bin', 0.12), ('rauman', 0.108), ('atch', 0.102), ('yramid', 0.102), ('histograms', 0.09), ('pyramids', 0.09), ('kernel', 0.087), ('correspondences', 0.086), ('features', 0.084), ('object', 0.083), ('wallraven', 0.078), ('berg', 0.076), ('recognition', 0.073), ('clutter', 0.073), ('image', 0.066), ('nest', 0.066), ('ernel', 0.066), ('vision', 0.066), ('intersection', 0.065), ('matched', 0.065), ('pose', 0.062), ('wolf', 0.061), ('bags', 0.06), ('unordered', 0.059), ('images', 0.056), ('semantic', 0.055), ('partial', 0.051), ('similarity', 0.049), ('boughorbel', 0.048), ('lw', 0.048), ('silhouette', 0.048), ('bins', 0.048), ('resolution', 0.047), ('prototypes', 0.046), ('lsi', 0.046), ('matches', 0.046), ('holub', 0.042), ('ltoe', 0.042), ('rtoe', 0.042), ('schmid', 0.042), ('silhouettes', 0.042), ('shashua', 0.041), ('lowe', 0.041), ('matchings', 0.041), ('bh', 0.041), ('shape', 0.04), ('publication', 0.036), ('frome', 0.036), ('lazebnik', 0.036), ('serre', 0.036), ('indyk', 0.036), ('kernels', 0.035), ('malik', 0.033), ('bijective', 0.033), ('rw', 0.033), ('contour', 0.033), ('discriminative', 0.032), ('june', 0.032), ('appearance', 0.032), ('mercer', 0.032), ('lk', 0.031), ('boughhorbel', 0.03), ('bowm', 0.03), ('cluttered', 0.03), ('lyu', 0.03), ('moreno', 0.03), ('mutch', 0.03), ('thaper', 0.03), ('unsegmented', 0.03), ('distances', 0.029), ('plot', 0.029), ('quantization', 0.029), ('kondor', 0.029), ('poses', 0.029), ('rs', 0.029), ('per', 0.029), ('category', 0.029), ('inputs', 0.027), ('level', 0.027), ('ls', 0.027), ('lt', 0.027), ('ra', 0.027), ('cardinalities', 0.027), ('jebara', 0.027), ('cut', 0.027), ('word', 0.026), ('local', 0.026), ('quantized', 0.025), ('sift', 0.025), ('occlusions', 0.025), ('svr', 0.025), ('vancouver', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features
Author: Kristen Grauman, Trevor Darrell
Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition
2 0.066137187 80 jmlr-2007-Synergistic Face Detection and Pose Estimation with Energy-Based Models
Author: Margarita Osadchy, Yann Le Cun, Matthew L. Miller
Abstract: We describe a novel method for simultaneously detecting faces and estimating their pose in real time. The method employs a convolutional network to map images of faces to points on a lowdimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold. Given an image, detecting a face and estimating its pose is viewed as minimizing an energy function with respect to the face/non-face binary variable and the continuous pose parameters. The system is trained to minimize a loss function that drives correct combinations of labels and pose to be associated with lower energy values than incorrect ones. The system is designed to handle very large range of poses without retraining. The performance of the system was tested on three standard data sets—for frontal views, rotated faces, and profiles— is comparable to previous systems that are designed to handle a single one of these data sets. We show that a system trained simuiltaneously for detection and pose estimation is more accurate on both tasks than similar systems trained for each task separately.1 Keywords: face detection, pose estimation, convolutional networks, energy based models, object recognition
3 0.057288095 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Abstract: Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling, IsoMap and correspondence analysis. Keywords: embedding algorithms, manifold learning, exponential families, multidimensional scaling, matrix factorization, semidefinite programming
4 0.055814441 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
Author: Evgeniy Gabrilovich, Shaul Markovitch
Abstract: Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowledge, including semi-supervised learning and transfer learning. In this work, we present a new framework for automatic acquisition of world knowledge and methods for incorporating it into the text categorization process. Our approach enhances machine learning algorithms with features generated from domain-specific and common-sense knowledge. This knowledge is represented by ontologies that contain hundreds of thousands of concepts, further enriched through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts that augment the bag of words used in simple supervised learning. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses two significant problems in natural language processing—synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. We applied our methodology using the Open Directory Project, the largest existing Web directory built by over 70,000 human editors. Experimental results over a range of data sets confirm improved performance compared to the bag of words document representation. Keywords: feature generation, text classification, background knowledge
5 0.048240211 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels
Author: Marco Reisert, Hans Burkhardt
Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series
6 0.047233973 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
7 0.044242263 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
8 0.043427773 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR
9 0.038565505 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
10 0.036651779 18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models
11 0.033775441 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification
12 0.03342237 30 jmlr-2007-Dynamics and Generalization Ability of LVQ Algorithms
13 0.032802071 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
14 0.031844076 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
15 0.030665789 23 jmlr-2007-Concave Learners for Rankboost
16 0.029991167 56 jmlr-2007-Multi-Task Learning for Classification with Dirichlet Process Priors
17 0.029854739 50 jmlr-2007-Local Discriminant Wavelet Packet Coordinates for Face Recognition
19 0.027718544 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)
20 0.026594412 47 jmlr-2007-Learning Horn Expressions with LOGAN-H
topicId topicWeight
[(0, 0.167), (1, 0.06), (2, 0.042), (3, 0.087), (4, -0.101), (5, 0.027), (6, -0.036), (7, -0.125), (8, 0.041), (9, 0.004), (10, -0.093), (11, 0.064), (12, -0.215), (13, 0.033), (14, -0.006), (15, -0.139), (16, 0.054), (17, -0.101), (18, 0.068), (19, -0.057), (20, -0.104), (21, 0.116), (22, 0.123), (23, 0.059), (24, 0.044), (25, 0.088), (26, 0.027), (27, 0.054), (28, -0.089), (29, -0.26), (30, -0.25), (31, 0.102), (32, 0.069), (33, -0.016), (34, -0.05), (35, -0.098), (36, -0.145), (37, -0.126), (38, -0.07), (39, 0.049), (40, 0.109), (41, -0.01), (42, 0.014), (43, 0.031), (44, -0.216), (45, 0.089), (46, -0.065), (47, -0.091), (48, 0.18), (49, -0.251)]
simIndex simValue paperId paperTitle
same-paper 1 0.95455933 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features
Author: Kristen Grauman, Trevor Darrell
Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition
2 0.33695415 80 jmlr-2007-Synergistic Face Detection and Pose Estimation with Energy-Based Models
Author: Margarita Osadchy, Yann Le Cun, Matthew L. Miller
Abstract: We describe a novel method for simultaneously detecting faces and estimating their pose in real time. The method employs a convolutional network to map images of faces to points on a lowdimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold. Given an image, detecting a face and estimating its pose is viewed as minimizing an energy function with respect to the face/non-face binary variable and the continuous pose parameters. The system is trained to minimize a loss function that drives correct combinations of labels and pose to be associated with lower energy values than incorrect ones. The system is designed to handle very large range of poses without retraining. The performance of the system was tested on three standard data sets—for frontal views, rotated faces, and profiles— is comparable to previous systems that are designed to handle a single one of these data sets. We show that a system trained simuiltaneously for detection and pose estimation is more accurate on both tasks than similar systems trained for each task separately.1 Keywords: face detection, pose estimation, convolutional networks, energy based models, object recognition
3 0.33183345 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels
Author: Marco Reisert, Hans Burkhardt
Abstract: This paper presents a new class of matrix valued kernels that are ideally suited to learn vector valued equivariant functions. Matrix valued kernels are a natural generalization of the common notion of a kernel. We set the theoretical foundations of so called equivariant matrix valued kernels. We work out several properties of equivariant kernels, we give an interpretation of their behavior and show relations to scalar kernels. The notion of (ir)reducibility of group representations is transferred into the framework of matrix valued kernels. At the end to two exemplary applications are demonstrated. We design a non-linear rotation and translation equivariant filter for 2D-images and propose an invariant object detector based on the generalized Hough transform. Keywords: kernel methods, matrix kernels, equivariance, group integration, representation theory, Hough transform, signal processing, Volterra series
4 0.29460022 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR
Author: Francesco Dinuzzo, Marta Neve, Giuseppe De Nicolao, Ugo Pietro Gianazza
Abstract: Support Vector Regression (SVR) for discrete data is considered. An alternative formulation of the representer theorem is derived. This result is based on the newly introduced notion of pseudoresidual and the use of subdifferential calculus. The representer theorem is exploited to analyze the sensitivity properties of ε-insensitive SVR and introduce the notion of approximate degrees of freedom. The degrees of freedom are shown to play a key role in the evaluation of the optimism, that is the difference between the expected in-sample error and the expected empirical risk. In this way, it is possible to define a C p -like statistic that can be used for tuning the parameters of SVR. The proposed tuning procedure is tested on a simulated benchmark problem and on a real world problem (Boston Housing data set). Keywords: statistical learning, reproducing kernel Hilbert spaces, support vector machines, representer theorem, regularization theory
Author: Evgeniy Gabrilovich, Shaul Markovitch
Abstract: Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowledge, including semi-supervised learning and transfer learning. In this work, we present a new framework for automatic acquisition of world knowledge and methods for incorporating it into the text categorization process. Our approach enhances machine learning algorithms with features generated from domain-specific and common-sense knowledge. This knowledge is represented by ontologies that contain hundreds of thousands of concepts, further enriched through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts that augment the bag of words used in simple supervised learning. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses two significant problems in natural language processing—synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. We applied our methodology using the Open Directory Project, the largest existing Web directory built by over 70,000 human editors. Experimental results over a range of data sets confirm improved performance compared to the bag of words document representation. Keywords: feature generation, text classification, background knowledge
6 0.27387959 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
7 0.24780689 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
8 0.23650132 30 jmlr-2007-Dynamics and Generalization Ability of LVQ Algorithms
9 0.22387375 13 jmlr-2007-Bayesian Quadratic Discriminant Analysis
10 0.20224676 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
12 0.18832915 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification
13 0.18636942 16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation
14 0.17922755 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
15 0.17229727 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method
16 0.171057 56 jmlr-2007-Multi-Task Learning for Classification with Dirichlet Process Priors
17 0.16955136 18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models
18 0.16504139 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
19 0.16096573 23 jmlr-2007-Concave Learners for Rankboost
20 0.15826273 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
topicId topicWeight
[(4, 0.011), (8, 0.032), (10, 0.013), (12, 0.036), (15, 0.045), (22, 0.016), (28, 0.053), (40, 0.026), (41, 0.4), (45, 0.016), (48, 0.068), (60, 0.038), (80, 0.015), (85, 0.046), (98, 0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.72725999 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features
Author: Kristen Grauman, Trevor Darrell
Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches. Keywords: kernel, sets of features, histogram intersection, multi-resolution histogram pyramid, approximate matching, object recognition
2 0.66344577 71 jmlr-2007-Refinable Kernels
Author: Yuesheng Xu, Haizhang Zhang
Abstract: Motivated by mathematical learning from training data, we introduce the notion of refinable kernels. Various characterizations of refinable kernels are presented. The concept of refinable kernels leads to the introduction of wavelet-like reproducing kernels. We also investigate a refinable kernel that forms a Riesz basis. In particular, we characterize refinable translation invariant kernels, and refinable kernels defined by refinable functions. This study leads to multiresolution analysis of reproducing kernel Hilbert spaces. Keywords: refinable kernels, refinable feature maps, wavelet-like reproducing kernels, dual kernels, learning with kernels, reproducing kernel Hilbert spaces, Riesz bases
3 0.33644214 26 jmlr-2007-Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis
Author: Masashi Sugiyama
Abstract: Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick. Keywords: dimensionality reduction, supervised learning, Fisher discriminant analysis, locality preserving projection, affinity matrix
4 0.32876194 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
Author: Guy Lebanon, Yi Mao, Joshua Dillon
Abstract: The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks. Keywords: text processing, local smoothing
5 0.32740706 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
6 0.32559904 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
7 0.32356405 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
8 0.31789541 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
9 0.31678766 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
10 0.31563872 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification
11 0.31547207 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
12 0.31454328 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels
13 0.31436586 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method
14 0.31341285 39 jmlr-2007-Handling Missing Values when Applying Classification Models
15 0.31320751 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
16 0.31216806 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR
17 0.31203258 37 jmlr-2007-GiniSupport Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression
18 0.31067583 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
19 0.30889988 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm
20 0.30626518 55 jmlr-2007-Minimax Regret Classifier for Imprecise Class Distributions