jmlr jmlr2006 jmlr2006-47 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael W. Spratling
Abstract: In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Nonnegative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints. Keywords: non-negative matrix factorisation, competitive learning, dendritic inhibition, object recognition
Reference: text
sentIndex sentText sentNum sentScore
1 It has been proposed that this method is particularly suitable for finding the components of images, since from the physical properties of image formation it is known that image components are non-negative and that these components are combined additively (i. [sent-49, score-0.572]
2 Both algorithms snmf and nnsc impose constraints on the sparseness of Y. [sent-58, score-0.476]
3 Algorithm nmfsc allows optional constraints to be imposed on the sparseness of either the basis vectors, the activations, or both. [sent-59, score-0.73]
4 For nmfsc (A) a constraint on the sparseness of the basis vectors was applied. [sent-61, score-0.716]
5 For nmfsc (Y) a constraint on the sparseness of the activations was applied. [sent-65, score-0.768]
6 For nmfsc (A&Y;) these same constraints were imposed on both the sparseness of basis vectors and the activations. [sent-77, score-0.752]
7 For the nmfsc algorithm, fixed values for the parameters controlling sparseness were used in order to provide a fairer comparison with the other algorithms all of which used constant parameter settings. [sent-78, score-0.674]
8 This form of inhibition has been termed dendritic inhibition or pre-integration lateral inhibition. [sent-104, score-0.626]
9 The full dendritic inhibition model (Spratling and Johnson, 2002, 2003) allows negative synaptic weight values. [sent-105, score-0.474]
10 However, in order to make a fairer comparison with NMF, a version of the dendritic inhibition algorithm in which all synaptic weights are restricted to be non-negative (by clipping the weights at zero) is also used in the experiments described here. [sent-106, score-0.61]
11 In a neural network model, an input image (xk ) generates activity (yk ) in the nodes of a neural network such that yk = f (W, xk ). [sent-108, score-0.457]
12 In the dendritic inhibition model, the activation of each individual node is calculated as: y jk = Wx′ j k 796 L EARNING I MAGE C OMPONENTS FOR O BJECT R ECOGNITION where x′ j is an inhibited version of the input activations (xk ) that can differ for each node. [sent-109, score-0.591]
13 The negative weights are used to ensure that each image component is represented by a distinct node rather than by the partial activation of multiple nodes each of which represents an overlapping image component. [sent-149, score-0.602]
14 1 Standard Bars Problem The bars problem (and its variations) is a benchmark task for the learning of independent image features (F¨ ldi´ k, 1990; Saund, 1995; Dayan and Zemel, 1995; Hinton et al. [sent-163, score-0.564]
15 In the standard version of the bars problem, u as defined by F¨ ldi´ k (1990), training data consists of 8 by 8 pixel images in which each of the 16 o a 1 possible (one-pixel wide) horizontal and vertical bars can be present with a probability of 8 . [sent-166, score-1.071]
16 798 L EARNING I MAGE C OMPONENTS FOR O BJECT R ECOGNITION Figure 2: Typical training patterns for the standard bars problem. [sent-168, score-0.447]
17 Horizontal and vertical bars in 8x8 pixel images are independently selected to be present with a probability of 1 . [sent-169, score-0.627]
18 In the first experiment the algorithms were trained on the standard bars problem. [sent-171, score-0.455]
19 It can also be seen that many of the NMF algorithms learnt representations of bars in which pixels were missing. [sent-181, score-0.632]
20 Hence, these algorithms learnt random pieces of image components rather than the image components themselves. [sent-182, score-0.622]
21 In contrast, the neural network algorithms (and algorithms nnsc, nmfsc (A), and nmfsc (A&Y;)) learnt to represent complete image components. [sent-183, score-1.343]
22 To quantify the results, the following procedure was used to determine the number of bars represented by each algorithm in each trial. [sent-184, score-0.438]
23 The number of unique bars represented by at least one basis vector, or one node of the network, was calculated, and this value was averaged over the 25 trials. [sent-187, score-0.547]
24 However, algorithms nmfsc (A) and nmfsc (A&Y;) produced good results, representing an average of 15. [sent-194, score-0.97]
25 For p = 400, every image component in every trial was found by algorithms nnsc, nmfsc (A), nmfsc (A&Y;) and di. [sent-200, score-1.124]
26 Results for different training set sizes are shown: the lighter, foreground, bars show results for p = 100, and the darker, background, bars show results for p = 400. [sent-208, score-0.82]
27 Error bars show best and worst performance, across the 25 trials, when p = 400. [sent-209, score-0.44]
28 Only NMF algorithms nmfsc (A) and nmfsc (A&Y;) reliably find nearly all the bars with both 16 and 32 basis vectors. [sent-213, score-1.479]
29 2 Bars Problems Without Occlusion To determine the effect of occlusion on the performance of each algorithm further experiments were performed using versions of the bars problem in which no occlusion occurs. [sent-219, score-0.636]
30 As with the standard bars problem, training data consisted of 8 by 8 pixel images in which each of the 16 possible (one-pixel wide) horizontal and vertical bars could be present with 1 a probability of 8 . [sent-223, score-1.071]
31 The number of unique bars represented by at least one basis vector, or one node of the network, was calculated, and this value was averaged over 25 trials. [sent-225, score-0.547]
32 Another version of the bars problem in which occlusion is avoided is one in which horizontal and vertical bars do not co-occur. [sent-227, score-1.003]
33 The number of unique bars represented by at least one basis vector, or one node of the network, was calculated, and this value was averaged over 25 trials. [sent-234, score-0.547]
34 For both experiments using training images in which occlusion does not occur, the performance of most of the NMF algorithms is improved considerably in comparison to the standard bars problem. [sent-236, score-0.641]
35 The neural network algorithms also reliably learn the image components as with the standard bars problem. [sent-237, score-0.812]
36 3 Bars Problem with More Occlusion To further explore the effects of occlusion on the learning of image components a version of the bars problem with double-width bars was used. [sent-239, score-1.175]
37 Training data consisted of 9 by 9 pixel images in which each of the 16 possible (two-pixel wide) horizontal and vertical bars could be present 1 with a probability 8 . [sent-240, score-0.661]
38 The image size was increased by one pixel to keep the number of image components equal to 16 (as in the previous experiments). [sent-241, score-0.459]
39 In this task, as in the standard bars problem, perpendicular bars overlap; however, the proportion of overlap is increased. [sent-242, score-0.884]
40 Each algorithm was trained on this double-width bars problem with p = 400. [sent-245, score-0.455]
41 Typical examples of the (generative) weights learnt by each NMF algorithm and of the (recognition) weights learnt by the neural network algorithms are shown in Figure 7. [sent-247, score-0.497]
42 It can be seen that the majority of the NMF algorithms learnt redundant encodings in which the basis vectors represent parts of image components rather than complete components. [sent-248, score-0.447]
43 In contrast, the neural network algorithms (and the nnsc algorithm) learnt to represent complete image components. [sent-249, score-0.521]
44 Results for different training set sizes are shown: the lighter, foreground, bars show results for p = 100, and the darker, background, bars show results for p = 400. [sent-252, score-0.82]
45 Error bars show best and worst performance, across the 25 trials, when p = 400. [sent-253, score-0.44]
46 5 times that of the sum of the weights for any other bar and if the minimum weight of any pixel forming part of that bar was greater than the mean of all the (positive) weights 803 S PRATLING Figure 6: Typical training patterns for the double-width bars problem. [sent-257, score-1.005]
47 Two pixel wide horizontal and vertical bars in a 9x9 pixel image are independently selected to be present with a probability of 1 . [sent-258, score-0.781]
48 The number of unique bars represented by at least one basis vector, or one node of the network, was calculated. [sent-261, score-0.547]
49 However, algorithms nnsc, nmfsc (A) and nmfsc (A&Y;) do succeed in learning the image components. [sent-269, score-1.124]
50 The dendritic inhibition algorithm, with negative weights, also succeeds in identifying all the double-width bars in most trials. [sent-270, score-0.781]
51 Assume that two nodes in the output layer (nodes 1 and 3) have learnt weights that are selective to two neighbouring, but non-overlapping, double-width bars (bars 1 and 3). [sent-275, score-0.689]
52 When the double-width bar (bar 2) that overlaps these two represented bars is presented to the input, then the network will respond by partially activating nodes 1 and 3. [sent-276, score-0.731]
53 Error bars show best and worst performance across the 25 trials. [sent-292, score-0.44]
54 Image components consisted of seven one-pixel wide bars and one nine-pixel wide bar in both the horizontal and vertical directions. [sent-310, score-0.773]
55 Parallel bars did not overlap, however, the proportion of overlap between the nine-pixel wide bars and all other perpendicular bars was large, while the proportion of overlap between perpendicular one-pixel wide bars was less than in the standard bars problem. [sent-312, score-2.22]
56 Each horizontal bar was selected to 1 be present in an image with a probability of 1 while vertical bars occurred with a probability of 32 . [sent-313, score-0.797]
57 Each algorithm was trained on this ‘unequal’ bars problem with p = 400. [sent-315, score-0.455]
58 Error bars show best and worst performance across the 25 trials. [sent-321, score-0.44]
59 , nmfsc (A) and nmfsc (A&Y;)) an alternative cause may be the imposition of a constraint that requires learnt image components to be a similar size, which is not the case for the components used in this task. [sent-326, score-1.465]
60 In contrast, the dendritic inhibition neural network algorithm with negative weights (di), succeeded in identifying all the bars in most trials. [sent-329, score-0.932]
61 When these training images are taken from the CBCL face database,1 algorithm nmfdiv learns basis vectors that correspond to localised image parts (Lee and Seung, 1999; Hoyer, 2004; Feng et al. [sent-337, score-0.555]
62 808 L EARNING I MAGE C OMPONENTS FOR O BJECT R ECOGNITION (a) nndi (b) di Figure 11: (a) and (b) Typical recognition weights learnt by 32 nodes when both versions of the dendritic inhibition algorithm were trained on the CBCL face database. [sent-344, score-0.925]
63 Algorithm nmfsc can find either local or global representations of either set of face images with appropriate values for the constraints on the sparseness of the basis vectors and activations. [sent-352, score-0.945]
64 Specifically, nmfsc learns localised image parts when constrained to produce highly sparse basis images, but learns global image features when constrained to produce basis images with low sparseness or if constrained to produce highly sparse activations (Hoyer, 2004). [sent-353, score-1.381]
65 The same behaviour is observed in the bars problems, reported above, where a basis vector often corresponds to a random subset of pixels along a row or column of the image rather than representing an entire bar. [sent-356, score-0.658]
66 In contrast when the dendritic inhibition neural network is trained on face images, it learns global representations. [sent-358, score-0.565]
67 Parameters were identical to those used for the bars problems, except the weights were initialised to random values chosen from n a Gaussian distribution with a larger standard deviation (0. [sent-360, score-0.48]
68 This means that image components may not overlap and, hence, results in this algorithm’s failure across all versions of the bars task. [sent-375, score-0.725]
69 In all these tasks the underlying image components are overlapping bars patterns (even if they do not overlap in any single training image, as is the case with the oneorientation bars problem). [sent-376, score-1.166]
70 , nmfdiv and nmfmse) generally succeed in identifying the underlying components of images when trained using images in which there is no occlusion (i. [sent-379, score-0.589]
71 Hence, these algorithms fail to learn many image features in the standard bars problem and produce even worse performance when tested on the double-width bars problem. [sent-383, score-0.999]
72 , snmf, nnsc, and nmfsc (Y)) require that the rows of Y have a particular sparseness. [sent-386, score-0.485]
73 For example, nnsc produced good results on the double-width bars problem. [sent-391, score-0.558]
74 Given sufficient training data, nnsc also reliably finds nearly all the image components in all experiments except for the standard bars test when n = 16 and the unequal bars problem. [sent-392, score-1.322]
75 However, nmfsc (Y) fails to produce consistently good results across experiments. [sent-393, score-0.515]
76 This algorithm only reliably found all the image components for the standard bars problem when n = 16 and for the linear bars problem. [sent-394, score-1.116]
77 Despite constraining the sparseness of the activations, algorithm snmf produced poor results in all experiments except for the linear bars problem. [sent-395, score-0.681]
78 , nmfsc (A)) requires that the columns of A have a particular sparseness. [sent-398, score-0.485]
79 This algorithm produces good results across all the experiments except the unequal bars problem. [sent-402, score-0.498]
80 Constraining the sparseness of the basis vectors thus appears to overcome the problems caused by occlusion and enable NMF to identify components in training images where occlusion occurs. [sent-403, score-0.663]
81 The NMF algorithm that imposes constraints on both the sparseness of the activations and the sparseness of the basis vectors (i. [sent-405, score-0.55]
82 , nmfsc (A&Y;)) produces results similar to those produced by nmfsc (A). [sent-407, score-0.97]
83 The performance of the nmfsc algorithm depends critically on the particular sparseness parameters that are chosen. [sent-408, score-0.649]
84 This is particularly a problem when the frequency and size of different image components varies within a 810 L EARNING I MAGE C OMPONENTS FOR O BJECT R ECOGNITION (a) (b) (c) (d) Figure 12: Performance of the nmfsc algorithm across the range of possible combinations of sparseness parameters. [sent-411, score-0.921]
85 For (a) the standard bars problem, (b) the one-orientation bars problem, (c) the double-width bars problem, and (d) the unequal bars problem. [sent-412, score-1.698]
86 The length of the edge of each filled box is proportional to the mean number of bars learnt over 25 trials for that combination of parameters. [sent-416, score-0.572]
87 Hence, the nmfsc algorithm was unable to identify all the components in the unequal bars problem with any combination of sparseness parameters (figure 12d). [sent-420, score-1.205]
88 In fact, no NMF algorithm succeeded in this task: either because the linear NMF model could not deal with occlusion or because the algorithm imposed sparseness constraints that could not be satisfied for all image components. [sent-421, score-0.467]
89 The parameter search shown in figure 12 was performed in order to select the parameter values that would produce the best overall results across all the tasks used in this paper for algorithms nmfsc (A), nmfsc (Y), and nmfsc (A&Y;). [sent-422, score-1.485]
90 However, in such recognition models, sparseness constraints usually limit the number of nodes that are simultaneously active in response to an input image (F¨ ldi´ k and Young, 1995; Olshausen and Field, 2004). [sent-426, score-0.478]
91 The dendritic inhibition model succeeds in learning representations of elementary image features without such constraints. [sent-430, score-0.59]
92 The dendritic inhibition algorithm thus provides an efficient, on-line, algorithm for finding image components. [sent-434, score-0.525]
93 When trained on images that are composed of elementary features, such as those used in the bars problems, algorithm di reliably and accurately learns representations of the underlying image features. [sent-435, score-0.919]
94 These re-occurring, holistic patterns, are learnt by the dendritic inhibition algorithm. [sent-438, score-0.509]
95 Rather than relying on a subjective assessment of the quality of the components that are learnt, the bars problems that are the main focus of this paper, provide a quantitative test of the accuracy and reliability with which elementary image features are discovered. [sent-445, score-0.682]
96 Since the underlying image components are known, it is possible to compare the components learnt with the known features from which the training images were created. [sent-446, score-0.586]
97 These results demonstrate that when the training images are actually composed of elementary features, NMF algorithms can fail to learn the underlying image components, whereas, the dendritic inhibition algorithm reliably and accurately does so. [sent-447, score-0.752]
98 Intuitively, the dendritic inhibition algorithm works because the learning rule causes nodes to learn re-occurring patterns of pre-synaptic activity. [sent-448, score-0.504]
99 However, because the inhibition is specific to a particular set of inputs, nodes do not interfere with the learning of distinct image components by other nodes. [sent-451, score-0.519]
100 Conclusions Non-negative matrix factorisation employs non-negativity constraints in order to model the physics of image formation, and it has been claimed that this makes NMF particularly suited to learning meaningful representations of image data (Lee and Seung, 1999; Feng et al. [sent-455, score-0.442]
wordName wordTfidf (topN-words)
[('nmfsc', 0.485), ('bars', 0.41), ('nmf', 0.321), ('inhibition', 0.206), ('dendritic', 0.165), ('sparseness', 0.164), ('bar', 0.163), ('image', 0.154), ('nnsc', 0.148), ('learnt', 0.138), ('nndi', 0.123), ('activations', 0.119), ('images', 0.118), ('occlusion', 0.113), ('lnmf', 0.107), ('nmfdiv', 0.107), ('nmfmse', 0.107), ('snmf', 0.107), ('bject', 0.09), ('mage', 0.09), ('pratling', 0.09), ('spratling', 0.09), ('components', 0.088), ('omponents', 0.077), ('ecognition', 0.077), ('hoyer', 0.075), ('synaptic', 0.074), ('nodes', 0.071), ('weights', 0.07), ('node', 0.064), ('pixel', 0.063), ('factorisation', 0.063), ('network', 0.059), ('ldi', 0.058), ('unequal', 0.058), ('feng', 0.056), ('reliably', 0.054), ('inhibit', 0.049), ('lateral', 0.049), ('pixels', 0.049), ('basis', 0.045), ('di', 0.045), ('trained', 0.045), ('overlap', 0.043), ('afferent', 0.041), ('excitatory', 0.041), ('fyfe', 0.041), ('localised', 0.041), ('face', 0.04), ('yk', 0.038), ('activation', 0.037), ('seung', 0.037), ('patterns', 0.037), ('constraints', 0.036), ('vertical', 0.036), ('representations', 0.035), ('responding', 0.035), ('dayan', 0.034), ('generative', 0.034), ('horizontal', 0.034), ('inhibitory', 0.033), ('synapses', 0.033), ('ji', 0.033), ('xk', 0.032), ('active', 0.031), ('across', 0.03), ('elementary', 0.03), ('hinton', 0.03), ('weight', 0.029), ('learns', 0.028), ('represented', 0.028), ('cause', 0.027), ('earning', 0.027), ('factors', 0.027), ('xik', 0.027), ('li', 0.026), ('epochs', 0.025), ('cbcl', 0.025), ('learn', 0.025), ('fairer', 0.025), ('neighbouring', 0.025), ('uncommitted', 0.025), ('object', 0.025), ('trials', 0.024), ('johnson', 0.024), ('overlapping', 0.024), ('liu', 0.023), ('lee', 0.023), ('vectors', 0.022), ('neural', 0.022), ('recognition', 0.022), ('coding', 0.021), ('inputs', 0.021), ('perceptual', 0.021), ('feedforward', 0.021), ('meila', 0.021), ('perpendicular', 0.021), ('zheng', 0.021), ('impose', 0.021), ('wide', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 47 jmlr-2006-Learning Image Components for Object Recognition
Author: Michael W. Spratling
Abstract: In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Nonnegative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints. Keywords: non-negative matrix factorisation, competitive learning, dendritic inhibition, object recognition
Author: Matthias Heiler, Christoph Schnörr
Abstract: We exploit the biconvex nature of the Euclidean non-negative matrix factorization (NMF) optimization problem to derive optimization schemes based on sequential quadratic and second order cone programming. We show that for ordinary NMF, our approach performs as well as existing stateof-the-art algorithms, while for sparsity-constrained NMF, as recently proposed by P. O. Hoyer in JMLR 5 (2004), it outperforms previous methods. In addition, we show how to extend NMF learning within the same optimization framework in order to make use of class membership information in supervised learning problems. Keywords: non-negative matrix factorization, second-order cone programming, sequential convex optimization, reverse-convex programming, sparsity
3 0.08939112 49 jmlr-2006-Learning Parts-Based Representations of Data
Author: David A. Ross, Richard S. Zemel
Abstract: Many perceptual models and theories hinge on treating objects as a collection of constituent parts. When applying these approaches to data, a fundamental problem arises: how can we determine what are the parts? We attack this problem using learning, proposing a form of generative latent factor model, in which each data dimension is allowed to select a different factor or part as its explanation. This approach permits a range of variations that posit different models for the appearance of a part. Here we provide the details for two such models: a discrete and a continuous one. Further, we show that this latent factor model can be extended hierarchically to account for correlations between the appearances of different parts. This permits modeling of data consisting of multiple categories, and learning these categories simultaneously with the parts when they are unobserved. Experiments demonstrate the ability to learn parts-based representations, and categories, of facial images and user-preference data. Keywords: parts, unsupervised learning, latent factor models, collaborative filtering, hierarchical learning
4 0.057171915 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection
Author: Hichem Sahbi, Donald Geman
Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection
5 0.041058559 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)
Author: Kristin P. Bennett, Emilio Parrado-Hernández
Abstract: The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, second-order cone, semidefinite, and semi-infinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to well-explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques. In turn, machine learning presents new challenges to mathematical programming. The special issue include papers from two primary themes: novel machine learning models and novel optimization approaches for existing models. Many papers blend both themes, making small changes in the underlying core mathematical program that enable the develop of effective new algorithms. Keywords: machine learning, mathematical programming, convex optimization
6 0.035253301 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)
7 0.02998065 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
8 0.028430769 12 jmlr-2006-Active Learning with Feedback on Features and Instances
9 0.027025739 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
10 0.026555218 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)
11 0.025941696 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification
12 0.025016934 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis
14 0.022809254 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
15 0.022028523 18 jmlr-2006-Building Support Vector Machines with Reduced Classifier Complexity (Special Topic on Machine Learning and Optimization)
17 0.021163752 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
18 0.020848922 88 jmlr-2006-Streamwise Feature Selection
20 0.019731861 4 jmlr-2006-A Linear Non-Gaussian Acyclic Model for Causal Discovery
topicId topicWeight
[(0, 0.127), (1, -0.078), (2, -0.047), (3, 0.147), (4, -0.072), (5, -0.088), (6, -0.11), (7, 0.01), (8, 0.259), (9, 0.161), (10, 0.107), (11, 0.28), (12, 0.002), (13, 0.106), (14, -0.21), (15, -0.166), (16, 0.37), (17, -0.036), (18, -0.056), (19, 0.224), (20, 0.011), (21, 0.067), (22, 0.044), (23, -0.094), (24, 0.009), (25, -0.007), (26, 0.053), (27, -0.13), (28, -0.063), (29, 0.03), (30, -0.042), (31, -0.065), (32, -0.009), (33, -0.033), (34, -0.079), (35, -0.008), (36, -0.015), (37, 0.115), (38, -0.011), (39, -0.01), (40, 0.006), (41, -0.0), (42, -0.005), (43, -0.021), (44, -0.025), (45, 0.02), (46, -0.043), (47, 0.008), (48, -0.055), (49, -0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.97007394 47 jmlr-2006-Learning Image Components for Object Recognition
Author: Michael W. Spratling
Abstract: In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Nonnegative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints. Keywords: non-negative matrix factorisation, competitive learning, dendritic inhibition, object recognition
Author: Matthias Heiler, Christoph Schnörr
Abstract: We exploit the biconvex nature of the Euclidean non-negative matrix factorization (NMF) optimization problem to derive optimization schemes based on sequential quadratic and second order cone programming. We show that for ordinary NMF, our approach performs as well as existing stateof-the-art algorithms, while for sparsity-constrained NMF, as recently proposed by P. O. Hoyer in JMLR 5 (2004), it outperforms previous methods. In addition, we show how to extend NMF learning within the same optimization framework in order to make use of class membership information in supervised learning problems. Keywords: non-negative matrix factorization, second-order cone programming, sequential convex optimization, reverse-convex programming, sparsity
3 0.44424823 49 jmlr-2006-Learning Parts-Based Representations of Data
Author: David A. Ross, Richard S. Zemel
Abstract: Many perceptual models and theories hinge on treating objects as a collection of constituent parts. When applying these approaches to data, a fundamental problem arises: how can we determine what are the parts? We attack this problem using learning, proposing a form of generative latent factor model, in which each data dimension is allowed to select a different factor or part as its explanation. This approach permits a range of variations that posit different models for the appearance of a part. Here we provide the details for two such models: a discrete and a continuous one. Further, we show that this latent factor model can be extended hierarchically to account for correlations between the appearances of different parts. This permits modeling of data consisting of multiple categories, and learning these categories simultaneously with the parts when they are unobserved. Experiments demonstrate the ability to learn parts-based representations, and categories, of facial images and user-preference data. Keywords: parts, unsupervised learning, latent factor models, collaborative filtering, hierarchical learning
4 0.24621321 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection
Author: Hichem Sahbi, Donald Geman
Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection
Author: Radu Stefan Niculescu, Tom M. Mitchell, R. Bharat Rao
Abstract: The task of learning models for many real-world problems requires incorporating domain knowledge into learning algorithms, to enable accurate learning from a realistic volume of training data. This paper considers a variety of types of domain knowledge for constraining parameter estimates when learning Bayesian networks. In particular, we consider domain knowledge that constrains the values or relationships among subsets of parameters in a Bayesian network with known structure. We incorporate a wide variety of parameter constraints into learning procedures for Bayesian networks, by formulating this task as a constrained optimization problem. The assumptions made in module networks, dynamic Bayes nets and context specific independence models can be viewed as particular cases of such parameter constraints. We present closed form solutions or fast iterative algorithms for estimating parameters subject to several specific classes of parameter constraints, including equalities and inequalities among parameters, constraints on individual parameters, and constraints on sums and ratios of parameters, for discrete and continuous variables. Our methods cover learning from both frequentist and Bayesian points of view, from both complete and incomplete data. We present formal guarantees for our estimators, as well as methods for automatically learning useful parameter constraints from data. To validate our approach, we apply it to the domain of fMRI brain image analysis. Here we demonstrate the ability of our system to first learn useful relationships among parameters, and then to use them to constrain the training of the Bayesian network, resulting in improved cross-validated accuracy of the learned model. Experiments on synthetic data are also presented. Keywords: Bayesian networks, constrained optimization, domain knowledge c 2006 Radu Stefan Niculescu, Tom Mitchell and Bharat Rao. N ICULESCU , M ITCHELL AND R AO
6 0.1387834 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
7 0.13663532 30 jmlr-2006-Evolutionary Function Approximation for Reinforcement Learning
9 0.12742352 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
10 0.12582202 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
12 0.10630094 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms
13 0.10560837 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis
15 0.10095631 8 jmlr-2006-A Very Fast Learning Method for Neural Networks Based on Sensitivity Analysis
16 0.1003414 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification
17 0.096980572 12 jmlr-2006-Active Learning with Feedback on Features and Instances
18 0.096975006 88 jmlr-2006-Streamwise Feature Selection
19 0.096241601 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error
topicId topicWeight
[(8, 0.038), (33, 0.475), (36, 0.069), (50, 0.034), (61, 0.028), (63, 0.04), (68, 0.021), (76, 0.01), (78, 0.03), (79, 0.027), (81, 0.017), (84, 0.017), (90, 0.012), (91, 0.023), (96, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.73524064 47 jmlr-2006-Learning Image Components for Object Recognition
Author: Michael W. Spratling
Abstract: In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Nonnegative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints. Keywords: non-negative matrix factorisation, competitive learning, dendritic inhibition, object recognition
2 0.22602755 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
Author: Francis R. Bach, Michael I. Jordan
Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis
3 0.22508807 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
4 0.22320691 44 jmlr-2006-Large Scale Transductive SVMs
Author: Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou
Abstract: We show how the concave-convex procedure can be applied to transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP
5 0.22186348 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms
Author: Mingrui Wu, Bernhard Schölkopf, Gökhan Bakır
Abstract: Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build sparse kernel learning algorithms by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting kernel machine is explicitly controlled while at the same time performance is kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying this method to the support vectom machine results in a concrete algorithm for building sparse large margin classifiers. These classifiers essentially find a discriminating subspace that can be spanned by a small number of vectors, and in this subspace, the different classes of data are linearly well separated. Experimental results over several classification benchmarks demonstrate the effectiveness of our approach. Keywords: sparse learning, sparse large margin classifiers, kernel learning algorithms, support vector machine, kernel Fisher discriminant
7 0.21947965 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation
11 0.21700749 71 jmlr-2006-Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis
12 0.21680555 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution
13 0.21591772 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs (Special Topic on Machine Learning and Optimization)
14 0.21585858 49 jmlr-2006-Learning Parts-Based Representations of Data
15 0.21584092 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
16 0.21518023 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
17 0.21423525 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)
18 0.21347257 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection
19 0.21300104 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis
20 0.21237333 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation