nips nips2004 nips2004-106 knowledge-graph by maker-knowledge-mining

106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification


Source: pdf

Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf

Abstract: We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Bulthoff and Bernhard Sch¨ lkopf o Max Planck Institute for Biological Cybernetics T¨ bingen, Germany u Abstract We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. [sent-9, score-0.574]

2 Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. [sent-11, score-0.186]

3 We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. [sent-12, score-0.314]

4 A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. [sent-13, score-0.257]

5 1 Introduction One of the central problems in vision science is to identify the features used by human subjects to classify visual stimuli. [sent-14, score-0.3]

6 We combine machine learning and psychophysical techniques to gain insight into the algorithms used by human subjects during visual classification of faces. [sent-15, score-0.409]

7 Comparing gender classification performance of humans to that of machines has attracted considerable attention in the past [2, 3, 4, 5]. [sent-16, score-0.184]

8 The main novel aspect of our study is to analyse the machine algorithms to make inferences about the features used by human subjects, thus providing an alternative to psychophysical feature extraction techniques such as the “bubbles” [6] or the noise classification image [7] techniques. [sent-17, score-0.269]

9 In this “machine-learning-psychophysics research” we first we train machine learning classifiers on the responses (labels) of human subjects to re-create the human decision boundaries by learning machines. [sent-18, score-0.458]

10 Then we look for correlations between machine classifiers and sev- eral characteristics of subjects’ responses to the stimuli—proportion correct, reaction times (RT) and confidence ratings. [sent-19, score-0.083]

11 Ideally this allows us to find preprocessor-classifier pairings that are closely aligned with the algorithm employed by the human brain for the task at hand. [sent-20, score-0.147]

12 Thereafter we analyse properties of the machine closest to the human—in our case support vector machines (SVMs), and to slightly lesser degree, relevance vector machines (RVMs)—and make predictions about human behaviour based on machine properties. [sent-21, score-0.37]

13 The decision-image has the same dimensionality as the (input-) images—in our case 256 × 256—whereas the normal vector lives in the (reduced dimensionality) space after preprocessing—in our case in 200 × 1 after Principal Component Analysis (PCA). [sent-24, score-0.086]

14 Second, we use w of the classifiers to generate novel stimuli by adding (or subtracting) various “amounts” (λw) to a genderless face in w PCA space. [sent-25, score-0.142]

15 We predict that the female-to-maleness transition along the vectors normal to the SHs, wSVM and wRVM , should be significantly faster than those along the normal vectors of machine classifiers that do not correlate as well with human subjects. [sent-27, score-0.314]

16 A psychophysical gender discrimination experiment confirms our predictions: the female-to-maleness axis of the SVM and, to a smaller extent, RVM, are more closely aligned with the human female-tomaleness axis than those of the prototype (Prot) and a Fisher linear discriminant (FLD) classifier. [sent-28, score-0.588]

17 2 Preprocessing and Machine Learning Methods We preprocessed the faces using PCA. [sent-29, score-0.085]

18 PCA is a good preprocessor in the current context since we have previously shown that in PCA-space strong correlations exist between man and machine [1]. [sent-30, score-0.125]

19 The face stimuli were taken from the gender-balanced Max Planck Institute (MPI) face database1 composed of 200 greyscale 256 × 256-pixel frontal 2 views of human faces, yielding a data matrix X ∈ R200×256 . [sent-32, score-0.345]

20 For the gender discrimination task we adhere to the following convention for the class labels: y = −1 for females and y = +1 for males. [sent-33, score-0.252]

21 The combination of the encoding matrix E with the true class labels y of the MPI database yields the true dataset, whereas its combination with the class labels yest by the subjects yields the subject dataset. [sent-36, score-0.379]

22 To model classification in human subjects we use methods from supervised machine learning. [sent-37, score-0.305]

23 In particular, we consider linear classifiers where classification is done using a SH defined by its normal vector w and offset b. [sent-38, score-0.065]

24 Furthermore the normal vector w of our classifiers can then be written as a linear combination of the input patterns xi with suitable coefficients αi as w = i αi xi . [sent-39, score-0.111]

25 Note that in our experiments the xi are the PCA coefficients of the imw 2 ages, that is xi ∈ R200 , whereas the images themselves are in R256 . [sent-41, score-0.1]

26 For the subject dataset we chose the mean values of w, b and w± over all subjects. [sent-42, score-0.075]

27 1 The MPI face database is located at http://faces. [sent-43, score-0.078]

28 SVMs have an intuitive geometrical interpretation: they classify by maximizing the margin separating both classes while minimizing the classification error. [sent-49, score-0.028]

29 It optimises the expansion coefficients of a SV-style decision function using a hyperprior which favours sparse solutions. [sent-51, score-0.028]

30 Common classifiers in neuroscience, cognitive science and psychology are variants of the Prototype classifier (Prot, [12]). [sent-52, score-0.041]

31 Their popularity is due to their simplicity: they classify according to the nearest mean-of-class prototype; in the simplest form all dimensions are weighted equally but variants exist that weight the dimensions inversely proportional the class variance along the dimensions. [sent-53, score-0.051]

32 As we cannot estimate class variance along all 200 dimensions from only 200 stimuli, we chose to implement the simplest Prot with equal weight along all dimensions. [sent-54, score-0.079]

33 The Fisher linear discriminant classifier (FLD, [13]) finds a direction in the dataset which allows best linear separation of the two classes. [sent-55, score-0.053]

34 This direction is then used as the normal vector of the separating hyperplane. [sent-56, score-0.093]

35 In fact, FLD is arguably a more principled whitened −1 variant of the Prot classifier: Its weight vector can be written as w = SW (µ+ −µ− ), where −1 SW is the within class covariance matrix of the two classes, and µ± are the class means. [sent-57, score-0.069]

36 Consequently, if we disregard the constant offset b, we can write the decision function as −1/2 −1/2 −1 w|x = SW (µ+ − µ− )|x = SW (µ+ − µ− )|SW x , which is a prototype classifier −1/2 using the prototypes µ± after whitening the space with SW . [sent-58, score-0.189]

37 2 Decision-Images and Generalised Portraits ¯ We combine the linear preprocessor (PCA) X = EB and the linear classifier (SVM, RVM, Prot, FLD) y(x) = w|x + b to yield a linear classification system: y = wT E T + b where b = b1. [sent-60, score-0.062]

38 The decision-images in the first row are those obtained if the classifiers are trained on the true dataset; those in the second row if trained on the subject dataset, marked on the right hand side of the figure by “true data” and “subj data”, respectively. [sent-66, score-0.109]

39 Decision-images are represented by a vector pointing to the positive class and can thus be expected to have male attributes (the negative of it looks female). [sent-67, score-0.076]

40 For the prototype learner, the eye and beard regions are most important. [sent-70, score-0.185]

41 The decision-images for the subject dataset are slightly more “face-like” and less holistic than those obtained using the true labels; the eye and mouth regions are more strongly emphasised. [sent-73, score-0.211]

42 This suggest that human subjects base their gender classification strongly on the eye and mouth regions of the face—clearly a sub-optimal strategy as revealed by the more holistic true dataset SVM, RVM and FLD decision-images. [sent-75, score-0.597]

43 A decision-image thus represents a way to extract the visual cues and features used by human subjects during visual classification without using a priori assumptions or knowledge about the task at hand. [sent-76, score-0.321]

44 SVM RVM Prot FLD trained on → W true data → W subj data Figure 1: Decision-images W for each classifier for both the true and the subject dataset; all images are rescaled to [0, 1] and their means set to 128 for illustration purposes (different scalers for different images). [sent-77, score-0.227]

45 The generalised portraits W± can be seen as “summary” faces in each class reflecting the decision rule of the classifier. [sent-79, score-0.476]

46 They can be viewed as an extension of the concept of a prototype: they are the prototype of the faces the classifier bases its decision on. [sent-80, score-0.249]

47 We note that w can be written as: w = i αi xi = i| sign(αi )=+1 αi xi − i| sign(αi )=−1 |αi |xi . [sent-81, score-0.046]

48 This allows to define the generalized portraits as W± which are computed by inverting the PCA transformation P on the patterns w± = i| P sign(αi )=±1 αi xi αi . [sent-82, score-0.186]

49 The generalised portraits for the SVM, RVM and FLD together with the Prot, where the prototype is the same as the generalised portrait, are shown in figure 2. [sent-84, score-0.653]

50 We also note that w can be written as w = i αi xi = i| sign(αi )=+1 αi xi − i| sign(αi )=−1 |αi |xi . [sent-85, score-0.046]

51 The generalised portraits can be associated with the correct class: W+ are males whereas W− are females. [sent-86, score-0.424]

52 The SVM and the FLD use patterns close to the SH for classification and hence their decision-images appear androgynous, whereas Prot and RVM tend to use patterns distant from the SH resulting in more female and male generalised portraits. [sent-87, score-0.29]

53 3 Human Gender Discrimination along the Decision-Image Axes The decision-images introduced in section 2. [sent-89, score-0.028]

54 2 are based purely on machine learning, albeit on labels provided by human subjects in the case of the subject dataset. [sent-90, score-0.378]

55 [Unfortunately the downsampling (low-pass filtering) of the faces necessary to fit them in the figure makes all the faces somewhat more androgynous than they are viewed at full resolution. [sent-93, score-0.201]

56 ] (RT) and confidence ratings—correlated very well with the distance of the stimuli to their separating hyperplane (SH) for support and relevance vector machines (SVMs, RVMs) but not for simple prototype (Prot) classifier. [sent-94, score-0.328]

57 In other words: the female-to-maleness axis of SVM and RVM should be closely aligned to those of our subjects whereas that is not expected to be the case for FLD and Prot. [sent-96, score-0.201]

58 1 Psychophysical Methods Four observers—one of the authors (FAW) with extensive psychophysical training and three na¨ve subjects paid for their participation—took part in a standard, spatial (left versus ı right) two-alternative forced-choice (2AFC) discrimination experiment. [sent-98, score-0.29]

59 Subjects were presented with two faces I(−λ) and I(λ) and had to indicate which face looked more male. [sent-99, score-0.163]

60 Neither male nor female faces changed the mean luminance. [sent-101, score-0.173]

61 Subjects viewed the screen binocularly with their head stabilised by a headrest. [sent-102, score-0.031]

62 The probability of the female face being presented on the left was 0. [sent-104, score-0.136]

63 5 on each trial and observers indicated whether they 0. [sent-105, score-0.087]

64 4 length of normalised decision image vector λ W / ||W|| @75% correct @90% correct c. [sent-118, score-0.196]

65 HM @75% correct FLD RVM Prot FLD RVM @90% correct 1. [sent-122, score-0.118]

66 FAW FLD proportion correct gender identification 1 0. [sent-134, score-0.244]

67 Shows raw data and fitted psychometric functions for one observer (FAW). [sent-140, score-0.108]

68 For each of four observers the threshold elevation for the RVM, Prot and FLD decision-image relative to that of the SVM; results are shown for both 75 and 90% correct together with 68%-CIs. [sent-142, score-0.208]

69 thought the left or right face was female by touching the corresponding location on a Elo TouchSystems touch-screen immediately in front of the display; no feedback was provided. [sent-145, score-0.136]

70 Trials were run in blocks of 256 in which eight repetitions of eight stimulus levels (±λ1 . [sent-146, score-0.062]

71 The na¨ve subı jects required approximately 2000 trials before their performance stabilised; thereafter they did another five to six blocks of 256 trials. [sent-150, score-0.045]

72 All results presented below are based on the trials after training; all training trials were discarded. [sent-151, score-0.044]

73 2 Results and Discussion Figure 3a shows the raw data and fitted psychometric functions for one of the observers. [sent-153, score-0.108]

74 Proportion correct gender identification on the y-axis is plotted against λ on the x-axis on semi-logarithmic coordinates. [sent-154, score-0.208]

75 68%-confidence intervals (CIs), indicated by horizontal lines at 75 and 90-% correct in figure 3a, were estimated by the BCa bootstrap method also implemented in psignifit [16]. [sent-156, score-0.059]

76 The raw data appear noisy because each data point is based on only eight trials. [sent-157, score-0.052]

77 However, none of fitted psychometric functions failed various Monte Carlo based goodness-of-fit tests [15]. [sent-158, score-0.087]

78 Figure 3b– e shows the thresholds for all four observers normalised by λSVM (the “threshold elevation” re. [sent-160, score-0.114]

79 0 for RVM, Prot and FLD indicate that more of the corresponding decision-images had to be added for the human observers to be able to discriminate females from males. [sent-163, score-0.239]

80 In figure 3f we pool the data across observers as the main trend, poorer performance for Prot and FLD compared to SVM and RVM, is apparent for all four observers. [sent-164, score-0.087]

81 The difference between SVM and RVM is small; going along the direction of both Prot and FLD, however, results in a much ”slower” transition from female-tomaleness. [sent-165, score-0.051]

82 The psychophysical data are very clear: all observers require a larger λ for Prot and FLD; the length ratio ranges from 1. [sent-166, score-0.17]

83 In the pooled data all the differences are statistically significant but even at the individual subject level all differences are significant at the 90% performance level, and five of eight are significant at the 75% performance level. [sent-170, score-0.106]

84 It thus appears that SVM and RVM capture more of the psychological face-space of our human observers than Prot and FLD. [sent-171, score-0.212]

85 From our results we cannot exclude the possibility that some other direction might have yielded even steeper psychometric functions, i. [sent-172, score-0.087]

86 faster female-to-maleness transitions, but we can conclude that the decision-images of SVM and RVM are closer to the decision-images used by human subjects than those of Prot and FLD. [sent-174, score-0.279]

87 This is exactly as predicted by the correlations between proportion correct, RTs and confidence ratings versus distance to the hyperplane reported in [1]—high correlations for SVM and RVM, low correlations for Prot. [sent-175, score-0.168]

88 4 Summary and Conclusions We studied classification and discrimination of human faces both psychophysically as well as using methods from machine learning. [sent-176, score-0.289]

89 The combination of linear preprocessor (PCA) and classifier (SVM, RVM, Prot and FLD) allowed us to visualise the decision-images of a classifier corresponding to the vector normal to the SH of the classifier. [sent-177, score-0.158]

90 Decision-images can be used to determine the regions of the stimuli most useful for classification simply by analysing the distribution of light and dark regions in the decision-image. [sent-178, score-0.11]

91 In addition we defined the generalised portraits to be the prototypes of all faces used by the classifier to obtain its classification. [sent-179, score-0.45]

92 For the SVM this is the weighted average of all the support vectors (SVs), for the RVM the weighted average of all the relevance vectors (RVs), and for the Prot it is the prototype itself. [sent-180, score-0.178]

93 The generalised portraits are, like the decision-images, another useful visualisation of the categorisation algorithm of the machine classifier. [sent-181, score-0.366]

94 In the machine-learning-psychophysics research we substitute a very hard to analyse complex system (the human brain) by a reasonably complex system (learning machine) that is complex enough to capture essentials of our human subjects’ behaviour but is nonetheless amenable to close analysis. [sent-183, score-0.285]

95 From the analysis of the machines we then derive predictions for human subjects which we subsequently test psychophysically. [sent-184, score-0.314]

96 Given the success in predicting the steepness of the female-to-male transition of the wSVM -axis we believe that the decision-image WSVM captures some of the essential characteristics of the human decision algorithm. [sent-185, score-0.176]

97 In addition we thank Frank J¨ kel for supplying us with the code to run the touch-screen experiment. [sent-187, score-0.031]

98 Insights from machine learning applied to human visual classification. [sent-194, score-0.172]

99 A comparison of two computer-based face recognition systems with human perceptions of faces. [sent-215, score-0.225]

100 Face recognition algorithms as models of human face processing. [sent-226, score-0.225]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('prot', 0.5), ('fld', 0.469), ('rvm', 0.422), ('generalised', 0.177), ('portraits', 0.163), ('subjects', 0.154), ('gender', 0.149), ('prototype', 0.136), ('human', 0.125), ('svm', 0.12), ('classi', 0.115), ('observers', 0.087), ('sh', 0.087), ('psychometric', 0.087), ('faces', 0.085), ('psychophysical', 0.083), ('face', 0.078), ('pca', 0.068), ('stimuli', 0.064), ('faw', 0.062), ('preprocessor', 0.062), ('sw', 0.062), ('elevation', 0.062), ('ers', 0.06), ('correct', 0.059), ('female', 0.058), ('subj', 0.054), ('discrimination', 0.053), ('portrait', 0.047), ('rvms', 0.047), ('wichmann', 0.047), ('wsvm', 0.047), ('er', 0.044), ('subject', 0.042), ('normal', 0.042), ('relevance', 0.042), ('mpi', 0.041), ('holistic', 0.037), ('correlations', 0.037), ('proportion', 0.036), ('machines', 0.035), ('analyse', 0.035), ('cation', 0.034), ('dataset', 0.033), ('pooled', 0.033), ('sign', 0.032), ('androgynous', 0.031), ('bruce', 0.031), ('eb', 0.031), ('kel', 0.031), ('psigni', 0.031), ('scalers', 0.031), ('stabilised', 0.031), ('visualise', 0.031), ('eight', 0.031), ('labels', 0.031), ('tted', 0.03), ('male', 0.03), ('images', 0.029), ('dence', 0.028), ('decision', 0.028), ('separating', 0.028), ('along', 0.028), ('planck', 0.028), ('bubbles', 0.027), ('normalised', 0.027), ('females', 0.027), ('machine', 0.026), ('eye', 0.026), ('svms', 0.026), ('true', 0.025), ('whereas', 0.025), ('graf', 0.025), ('prototypes', 0.025), ('mouth', 0.025), ('xi', 0.023), ('vector', 0.023), ('transition', 0.023), ('thereafter', 0.023), ('regions', 0.023), ('class', 0.023), ('closely', 0.022), ('trials', 0.022), ('wt', 0.022), ('fisher', 0.022), ('frank', 0.022), ('psychophysics', 0.022), ('recognition', 0.022), ('raw', 0.021), ('row', 0.021), ('perception', 0.021), ('rescaled', 0.021), ('ratings', 0.021), ('cognitive', 0.021), ('dimensionality', 0.021), ('visual', 0.021), ('discriminant', 0.02), ('gure', 0.02), ('reaction', 0.02), ('psychology', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf

Abstract: We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1

2 0.11661987 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data

Author: Oliver Williams, Andrew Blake, Roberto Cipolla

Abstract: There has been substantial progress in the past decade in the development of object classifiers for images, for example of faces, humans and vehicles. Here we address the problem of contaminations (e.g. occlusion, shadows) in test images which have not explicitly been encountered in training data. The Variational Ising Classifier (VIC) algorithm models contamination as a mask (a field of binary variables) with a strong spatial coherence prior. Variational inference is used to marginalize over contamination and obtain robust classification. In this way the VIC approach can turn a kernel classifier for clean data into one that can tolerate contamination, without any specific training on contaminated positives. 1

3 0.082593665 34 nips-2004-Breaking SVM Complexity with Cross-Training

Author: Léon Bottou, Jason Weston, Gökhan H. Bakir

Abstract: We propose to selectively remove examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). This heuristic procedure aims at creating a separable distribution of training examples with minimal impact on the position of the decision boundary. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages. 1

4 0.077313967 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces

Author: N. J. Hill, Thomas N. Lal, Karin Bierig, Niels Birbaumer, Bernhard Schölkopf

Abstract: Motivated by the particular problems involved in communicating with “locked-in” paralysed patients, we aim to develop a braincomputer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged eventrelated potentials, we show that an untrained user’s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI. 1

5 0.071407974 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging

Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse

Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1

6 0.070727371 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

7 0.062279157 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

8 0.058445971 68 nips-2004-Face Detection --- Efficient and Rank Deficient

9 0.058237638 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

10 0.057667337 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

11 0.056907095 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

12 0.055124491 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

13 0.053227838 178 nips-2004-Support Vector Classification with Input Data Uncertainty

14 0.052840512 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

15 0.050408859 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

16 0.048690289 21 nips-2004-An Information Maximization Model of Eye Movements

17 0.048602164 49 nips-2004-Density Level Detection is Classification

18 0.048497595 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM

19 0.048092242 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

20 0.048013087 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.141), (1, 0.042), (2, -0.043), (3, 0.013), (4, 0.039), (5, 0.073), (6, 0.132), (7, -0.067), (8, 0.078), (9, 0.022), (10, -0.053), (11, 0.015), (12, -0.08), (13, -0.025), (14, 0.024), (15, -0.088), (16, -0.095), (17, 0.045), (18, 0.021), (19, 0.077), (20, 0.007), (21, 0.002), (22, 0.039), (23, 0.006), (24, 0.106), (25, 0.04), (26, 0.016), (27, 0.033), (28, -0.082), (29, 0.038), (30, 0.147), (31, -0.044), (32, 0.039), (33, -0.013), (34, -0.091), (35, -0.093), (36, -0.023), (37, 0.047), (38, 0.214), (39, 0.049), (40, 0.025), (41, -0.048), (42, -0.074), (43, 0.125), (44, -0.039), (45, 0.007), (46, -0.1), (47, 0.109), (48, 0.109), (49, -0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91715753 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf

Abstract: We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1

2 0.66937357 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data

Author: Oliver Williams, Andrew Blake, Roberto Cipolla

Abstract: There has been substantial progress in the past decade in the development of object classifiers for images, for example of faces, humans and vehicles. Here we address the problem of contaminations (e.g. occlusion, shadows) in test images which have not explicitly been encountered in training data. The Variational Ising Classifier (VIC) algorithm models contamination as a mask (a field of binary variables) with a strong spatial coherence prior. Variational inference is used to marginalize over contamination and obtain robust classification. In this way the VIC approach can turn a kernel classifier for clean data into one that can tolerate contamination, without any specific training on contaminated positives. 1

3 0.51678699 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

Author: Margarita Osadchy, Matthew L. Miller, Yann L. Cun

Abstract: We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets – one for frontal pose, one for rotated faces, and one for profiles – and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system’s accuracy on both face detection and pose estimation is improved by training for the two tasks together.

4 0.4938412 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

Author: Alan Stocker, Eero P. Simoncelli

Abstract: It has been demonstrated that basic aspects of human visual motion perception are qualitatively consistent with a Bayesian estimation framework, where the prior probability distribution on velocity favors slow speeds. Here, we present a refined probabilistic model that can account for the typical trial-to-trial variabilities observed in psychophysical speed perception experiments. We also show that data from such experiments can be used to constrain both the likelihood and prior functions of the model. Specifically, we measured matching speeds and thresholds in a two-alternative forced choice speed discrimination task. Parametric fits to the data reveal that the likelihood function is well approximated by a LogNormal distribution with a characteristic contrast-dependent variance, and that the prior distribution on velocity exhibits significantly heavier tails than a Gaussian, and approximately follows a power-law function. Humans do not perceive visual motion veridically. Various psychophysical experiments have shown that the perceived speed of visual stimuli is affected by stimulus contrast, with low contrast stimuli being perceived to move slower than high contrast ones [1, 2]. Computational models have been suggested that can qualitatively explain these perceptual effects. Commonly, they assume the perception of visual motion to be optimal either within a deterministic framework with a regularization constraint that biases the solution toward zero motion [3, 4], or within a probabilistic framework of Bayesian estimation with a prior that favors slow velocities [5, 6]. The solutions resulting from these two frameworks are similar (and in some cases identical), but the probabilistic framework provides a more principled formulation of the problem in terms of meaningful probabilistic components. Specifically, Bayesian approaches rely on a likelihood function that expresses the relationship between the noisy measurements and the quantity to be estimated, and a prior distribution that expresses the probability of encountering any particular value of that quantity. A probabilistic model can also provide a richer description, by defining a full probability density over the set of possible “percepts”, rather than just a single value. Numerous analyses of psychophysical experiments have made use of such distributions within the framework of signal detection theory in order to model perceptual behavior [7]. Previous work has shown that an ideal Bayesian observer model based on Gaussian forms µ posterior low contrast probability density probability density high contrast likelihood prior a posterior likelihood prior v ˆ v ˆ visual speed µ b visual speed Figure 1: Bayesian model of visual speed perception. a) For a high contrast stimulus, the likelihood has a narrow width (a high signal-to-noise ratio) and the prior induces only a small shift µ of the mean v of the posterior. b) For a low contrast stimuli, the measurement ˆ is noisy, leading to a wider likelihood. The shift µ is much larger and the perceived speed lower than under condition (a). for both likelihood and prior is sufficient to capture the basic qualitative features of global translational motion perception [5, 6]. But the behavior of the resulting model deviates systematically from human perceptual data, most importantly with regard to trial-to-trial variability and the precise form of interaction between contrast and perceived speed. A recent article achieved better fits for the model under the assumption that human contrast perception saturates [8]. In order to advance the theory of Bayesian perception and provide significant constraints on models of neural implementation, it seems essential to constrain quantitatively both the likelihood function and the prior probability distribution. In previous work, the proposed likelihood functions were derived from the brightness constancy constraint [5, 6] or other generative principles [9]. Also, previous approaches defined the prior distribution based on general assumptions and computational convenience, typically choosing a Gaussian with zero mean, although a Laplacian prior has also been suggested [4]. In this paper, we develop a more general form of Bayesian model for speed perception that can account for trial-to-trial variability. We use psychophysical speed discrimination data in order to constrain both the likelihood and the prior function. 1 1.1 Probabilistic Model of Visual Speed Perception Ideal Bayesian Observer Assume that an observer wants to obtain an estimate for a variable v based on a measurement m that she/he performs. A Bayesian observer “knows” that the measurement device is not ideal and therefore, the measurement m is affected by noise. Hence, this observer combines the information gained by the measurement m with a priori knowledge about v. Doing so (and assuming that the prior knowledge is valid), the observer will – on average – perform better in estimating v than just trusting the measurements m. According to Bayes’ rule 1 p(v|m) = p(m|v)p(v) (1) α the probability of perceiving v given m (posterior) is the product of the likelihood of v for a particular measurements m and the a priori knowledge about the estimated variable v (prior). α is a normalization constant independent of v that ensures that the posterior is a proper probability distribution. ^ ^ P(v2 > v1) 1 + Pcum=0.5 0 a b Pcum=0.875 vmatch vthres v2 Figure 2: 2AFC speed discrimination experiment. a) Two patches of drifting gratings were displayed simultaneously (motion without movement). The subject was asked to fixate the center cross and decide after the presentation which of the two gratings was moving faster. b) A typical psychometric curve obtained under such paradigm. The dots represent the empirical probability that the subject perceived stimulus2 moving faster than stimulus1. The speed of stimulus1 was fixed while v2 is varied. The point of subjective equality, vmatch , is the value of v2 for which Pcum = 0.5. The threshold velocity vthresh is the velocity for which Pcum = 0.875. It is important to note that the measurement m is an internal variable of the observer and is not necessarily represented in the same space as v. The likelihood embodies both the mapping from v to m and the noise in this mapping. So far, we assume that there is a monotonic function f (v) : v → vm that maps v into the same space as m (m-space). Doing so allows us to analytically treat m and vm in the same space. We will later propose a suitable form of the mapping function f (v). An ideal Bayesian observer selects the estimate that minimizes the expected loss, given the posterior and a loss function. We assume a least-squares loss function. Then, the optimal estimate v is the mean of the posterior in Equation (1). It is easy to see why this model ˆ of a Bayesian observer is consistent with the fact that perceived speed decreases with contrast. The width of the likelihood varies inversely with the accuracy of the measurements performed by the observer, which presumably decreases with decreasing contrast due to a decreasing signal-to-noise ratio. As illustrated in Figure 1, the shift in perceived speed towards slow velocities grows with the width of the likelihood, and thus a Bayesian model can qualitatively explain the psychophysical results [1]. 1.2 Two Alternative Forced Choice Experiment We would like to examine perceived speeds under a wide range of conditions in order to constrain a Bayesian model. Unfortunately, perceived speed is an internal variable, and it is not obvious how to design an experiment that would allow subjects to express it directly 1 . Perceived speed can only be accessed indirectly by asking the subject to compare the speed of two stimuli. For a given trial, an ideal Bayesian observer in such a two-alternative forced choice (2AFC) experimental paradigm simply decides on the basis of the two trial estimates v1 (stimulus1) and v2 (stimulus2) which stimulus moves faster. Each estimate v is based ˆ ˆ ˆ on a particular measurement m. For a given stimulus with speed v, an ideal Bayesian observer will produce a distribution of estimates p(ˆ|v) because m is noisy. Over trials, v the observers behavior can be described by classical signal detection theory based on the distributions of the estimates, hence e.g. the probability of perceiving stimulus2 moving 1 Although see [10] for an example of determining and even changing the prior of a Bayesian model for a sensorimotor task, where the estimates are more directly accessible. faster than stimulus1 is given as the cumulative probability Pcum (ˆ2 > v1 ) = v ˆ ∞ 0 p(ˆ2 |v2 ) v v2 ˆ 0 p(ˆ1 |v1 ) dˆ1 dˆ2 v v v (2) Pcum describes the full psychometric curve. Figure 2b illustrates the measured psychometric curve and its fit from such an experimental situation. 2 Experimental Methods We measured matching speeds (Pcum = 0.5) and thresholds (Pcum = 0.875) in a 2AFC speed discrimination task. Subjects were presented simultaneously with two circular patches of horizontally drifting sine-wave gratings for the duration of one second (Figure 2a). Patches were 3deg in diameter, and were displayed at 6deg eccentricity to either side of a fixation cross. The stimuli had an identical spatial frequency of 1.5 cycle/deg. One stimulus was considered to be the reference stimulus having one of two different contrast values (c1 =[0.075 0.5]) and one of five different speed values (u1 =[1 2 4 8 12] deg/sec) while the second stimulus (test) had one of five different contrast values (c2 =[0.05 0.1 0.2 0.4 0.8]) and a varying speed that was determined by an interleaved staircase procedure. For each condition there were 96 trials. Conditions were randomly interleaved, including a random choice of stimulus identity (test vs. reference) and motion direction (right vs. left). Subjects were asked to fixate during stimulus presentation and select the faster moving stimulus. The threshold experiment differed only in that auditory feedback was given to indicate the correctness of their decision. This did not change the outcome of the experiment but increased significantly the quality of the data and thus reduced the number of trials needed. 3 Analysis With the data from the speed discrimination experiments we could in principal apply a parametric fit using Equation (2) to derive the prior and the likelihood, but the optimization is difficult, and the fit might not be well constrained given the amount of data we have obtained. The problem becomes much more tractable given the following weak assumptions: • We consider the prior to be relatively smooth. • We assume that the measurement m is corrupted by additive Gaussian noise with a variance whose dependence on stimulus speed and contrast is separable. • We assume that there is a mapping function f (v) : v → vm that maps v into the space of m (m-space). In that space, the likelihood is convolutional i.e. the noise in the measurement directly defines the width of the likelihood. These assumptions allow us to relate the psychophysical data to our probabilistic model in a simple way. The following analysis is in the m-space. The point of subjective equality (Pcum = 0.5) is defined as where the expected values of the speed estimates are equal. We write E vm,1 ˆ vm,1 − E µ1 = E vm,2 ˆ = vm,2 − E µ2 (3) where E µ is the expected shift of the perceived speed compared to the veridical speed. For the discrimination threshold experiment, above assumptions imply that the variance var vm of the speed estimates vm is equal for both stimuli. Then, (2) predicts that the ˆ ˆ discrimination threshold is proportional to the standard deviation, thus vm,2 − vm,1 = γ var vm ˆ (4) likelihood a b prior vm Figure 3: Piece-wise approximation We perform a parametric fit by assuming the prior to be piece-wise linear and the likelihood to be LogNormal (Gaussian in the m-space). where γ is a constant that depends on the threshold criterion Pcum and the exact shape of p(ˆm |vm ). v 3.1 Estimating the prior and likelihood In order to extract the prior and the likelihood of our model from the data, we have to find a generic local form of the prior and the likelihood and relate them to the mean and the variance of the speed estimates. As illustrated in Figure 3, we assume that the likelihood is Gaussian with a standard deviation σ(c, vm ). Furthermore, the prior is assumed to be wellapproximated by a first-order Taylor series expansion over the velocity ranges covered by the likelihood. We parameterize this linear expansion of the prior as p(vm ) = avm + b. We now can derive a posterior for this local approximation of likelihood and prior and then define the perceived speed shift µ(m). The posterior can be written as 2 vm 1 1 p(m|vm )p(vm ) = [exp(− )(avm + b)] α α 2σ(c, vm )2 where α is the normalization constant ∞ b p(m|vm )p(vm )dvm = π2σ(c, vm )2 α= 2 −∞ p(vm |m) = (5) (6) We can compute µ(m) as the first order moment of the posterior for a given m. Exploiting the symmetries around the origin, we find ∞ a(m) µ(m) = σ(c, vm )2 vp(vm |m)dvm ≡ (7) b(m) −∞ The expected value of µ(m) is equal to the value of µ at the expected value of the measurement m (which is the stimulus velocity vm ), thus a(vm ) σ(c, vm )2 E µ = µ(m)|m=vm = (8) b(vm ) Similarly, we derive var vm . Because the estimator is deterministic, the variance of the ˆ estimate only depends on the variance of the measurement m. For a given stimulus, the variance of the estimate can be well approximated by ∂ˆm (m) v var vm = var m ( ˆ |m=vm )2 (9) ∂m ∂µ(m) |m=vm )2 ≈ var m = var m (1 − ∂m Under the assumption of a locally smooth prior, the perceived velocity shift remains locally constant. The variance of the perceived speed vm becomes equal to the variance of the ˆ measurement m, which is the variance of the likelihood (in the m-space), thus var vm = σ(c, vm )2 ˆ (10) With (3) and (4), above derivations provide a simple dependency of the psychophysical data to the local parameters of the likelihood and the prior. 3.2 Choosing a Logarithmic speed representation We now want to choose the appropriate mapping function f (v) that maps v to the m-space. We define the m-space as the space in which the likelihood is Gaussian with a speedindependent width. We have shown that discrimination threshold is proportional to the width of the likelihood (4), (10). Also, we know from the psychophysics literature that visual speed discrimination approximately follows a Weber-Fechner law [11, 12], thus that the discrimination threshold increases roughly proportional with speed and so would the likelihood. A logarithmic speed representation would be compatible with the data and our choice of the likelihood. Hence, we transform the linear speed-domain v into a normalized logarithmic domain according to v + v0 vm = f (v) = ln( ) (11) v0 where v0 is a small normalization constant. The normalization is chosen to account for the expected deviation of equal variance behavior at the low end. Surprisingly, it has been found that neurons in the Medial Temporal area (Area MT) of macaque monkeys have speed-tuning curves that are very well approximated by Gaussians of constant width in above normalized logarithmic space [13]. These neurons are known to play a central role in the representation of motion. It seems natural to assume that they are strongly involved in tasks such as our performed psychophysical experiments. 4 Results Figure 4 shows the contrast dependent shift of speed perception and the speed discrimination threshold data for two subjects. Data points connected with a dashed line represent the relative matching speed (v2 /v1 ) for a particular contrast value c2 of the test stimulus as a function of the speed of the reference stimulus. Error bars are the empirical standard deviation of fits to bootstrapped samples of the data. Clearly, low contrast stimuli are perceived to move slower. The effect, however, varies across the tested speed range and tends to become smaller for higher speeds. The relative discrimination thresholds for two different contrasts as a function of speed show that the Weber-Fechner law holds only approximately. The data are in good agreement with other data from the psychophysics literature [1, 11, 8]. For each subject, data from both experiments were used to compute a parametric leastsquares fit according to (3), (4), (7), and (10). In order to test the assumption of a LogNormal likelihood we allowed the standard deviation to be dependent on contrast and speed, thus σ(c, vm ) = g(c)h(vm ). We split the speed range into six bins (subject2: five) and parameterized h(vm ) and the ratio a/b accordingly. Similarly, we parameterized g(c) for the seven contrast values. The resulting fits are superimposed as bold lines in Figure 4. Figure 5 shows the fitted parametric values for g(c) and h(v) (plotted in the linear domain), and the reconstructed prior distribution p(v) transformed back to the linear domain. The approximately constant values for h(v) provide evidence that a LogNormal distribution is an appropriate functional description of the likelihood. The resulting values for g(c) suggest for the likelihood width a roughly exponential decaying dependency on contrast with strong saturation for higher contrasts. discrimination threshold (relative) reference stimulus contrast c1: 0.075 0.5 subject 1 normalized matching speed 1.5 contrast c2 1 0.5 1 10 0.075 0.5 0.79 0.5 0.4 0.3 0.2 0.1 0 10 1 contrast: 1 10 discrimination threshold (relative) normalized matching speed subject 2 1.5 contrast c2 1 0.5 10 1 a 0.5 0.4 0.3 0.2 0.1 10 1 1 b speed of reference stimulus [deg/sec] 10 stimulus speed [deg/sec] Figure 4: Speed discrimination data for two subjects. a) The relative matching speed of a test stimulus with different contrast levels (c2 =[0.05 0.1 0.2 0.4 0.8]) to achieve subjective equality with a reference stimulus (two different contrast values c1 ). b) The relative discrimination threshold for two stimuli with equal contrast (c1,2 =[0.075 0.5]). reconstructed prior subject 1 p(v) [unnormalized] 1 Gaussian Power-Law g(c) 1 h(v) 2 0.9 1.5 0.8 0.1 n=-1.41 0.7 1 0.6 0.01 0.5 0.5 0.4 0.3 1 p(v) [unnormalized] subject 2 10 0.1 1 1 1 1 10 1 10 2 0.9 n=-1.35 0.1 1.5 0.8 0.7 1 0.6 0.01 0.5 0.5 0.4 1 speed [deg/sec] 10 0.3 0 0.1 1 contrast speed [deg/sec] Figure 5: Reconstructed prior distribution and parameters of the likelihood function. The reconstructed prior for both subjects show much heavier tails than a Gaussian (dashed fit), approximately following a power-law function with exponent n ≈ −1.4 (bold line). 5 Conclusions We have proposed a probabilistic framework based on a Bayesian ideal observer and standard signal detection theory. We have derived a likelihood function and prior distribution for the estimator, with a fairly conservative set of assumptions, constrained by psychophysical measurements of speed discrimination and matching. The width of the resulting likelihood is nearly constant in the logarithmic speed domain, and decreases approximately exponentially with contrast. The prior expresses a preference for slower speeds, and approximately follows a power-law distribution, thus has much heavier tails than a Gaussian. It would be interesting to compare the here derived prior distributions with measured true distributions of local image velocities that impinge on the retina. Although a number of authors have measured the spatio-temporal structure of natural images [14, e.g. ], it is clearly difficult to extract therefrom the true prior distribution because of the feedback loop formed through movements of the body, head and eyes. Acknowledgments The authors thank all subjects for their participation in the psychophysical experiments. References [1] P. Thompson. Perceived rate of movement depends on contrast. Vision Research, 22:377–380, 1982. [2] L.S. Stone and P. Thompson. Human speed perception is contrast dependent. Vision Research, 32(8):1535–1549, 1992. [3] A. Yuille and N. Grzywacz. A computational theory for the perception of coherent visual motion. Nature, 333(5):71–74, May 1988. [4] Alan Stocker. Constraint Optimization Networks for Visual Motion Perception - Analysis and Synthesis. PhD thesis, Dept. of Physics, Swiss Federal Institute of Technology, Z¨ rich, Switzeru land, March 2002. [5] Eero Simoncelli. Distributed analysis and representation of visual motion. PhD thesis, MIT, Dept. of Electrical Engineering, Cambridge, MA, 1993. [6] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [7] D.M. Green and J.A. Swets. Signal Detection Theory and Psychophysics. Wiley, New York, 1966. [8] F. H¨ rlimann, D. Kiper, and M. Carandini. Testing the Bayesian model of perceived speed. u Vision Research, 2002. [9] Y. Weiss and D.J. Fleet. Probabilistic Models of the Brain, chapter Velocity Likelihoods in Biological and Machine Vision, pages 77–96. Bradford, 2002. [10] K. Koerding and D. Wolpert. Bayesian integration in sensorimotor learning. 427(15):244–247, January 2004. Nature, [11] Leslie Welch. The perception of moving plaids reveals two motion-processing stages. Nature, 337:734–736, 1989. [12] S. McKee, G. Silvermann, and K. Nakayama. Precise velocity discrimintation despite random variations in temporal frequency and contrast. Vision Research, 26(4):609–619, 1986. [13] C.H. Anderson, H. Nover, and G.C. DeAngelis. Modeling the velocity tuning of macaque MT neurons. Journal of Vision/VSS abstract, 2003. [14] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6:345–358, 1995.

5 0.4732888 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging

Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse

Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1

6 0.45849422 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

7 0.43765157 68 nips-2004-Face Detection --- Efficient and Rank Deficient

8 0.4362618 192 nips-2004-The power of feature clustering: An application to object detection

9 0.43100196 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

10 0.43060356 127 nips-2004-Neighbourhood Components Analysis

11 0.39601484 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations

12 0.38494155 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces

13 0.37498942 101 nips-2004-Learning Syntactic Patterns for Automatic Hypernym Discovery

14 0.3719759 34 nips-2004-Breaking SVM Complexity with Cross-Training

15 0.36798996 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

16 0.36566755 136 nips-2004-On Semi-Supervised Classification

17 0.36365885 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process

18 0.358915 156 nips-2004-Result Analysis of the NIPS 2003 Feature Selection Challenge

19 0.35706472 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)

20 0.35653073 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.34), (13, 0.062), (15, 0.11), (26, 0.069), (31, 0.029), (33, 0.182), (35, 0.029), (39, 0.017), (50, 0.031), (56, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77575874 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf

Abstract: We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1

2 0.71683371 177 nips-2004-Supervised Graph Inference

Author: Jean-philippe Vert, Yoshihiro Yamanishi

Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1

3 0.71465772 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

Abstract: We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. 1

4 0.56295407 69 nips-2004-Fast Rates to Bayes for Kernel Machines

Author: Ingo Steinwart, Clint Scovel

Abstract: We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al.. 1

5 0.56144929 161 nips-2004-Self-Tuning Spectral Clustering

Author: Lihi Zelnik-manor, Pietro Perona

Abstract: We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local’ scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated. 1

6 0.56062049 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

7 0.56041896 45 nips-2004-Confidence Intervals for the Area Under the ROC Curve

8 0.55850017 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

9 0.5581907 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

10 0.55749249 77 nips-2004-Hierarchical Clustering of a Mixture Model

11 0.55697364 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

12 0.55666757 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

13 0.55663633 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

14 0.55659777 103 nips-2004-Limits of Spectral Clustering

15 0.55655587 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

16 0.55631417 207 nips-2004-ℓ₀-norm Minimization for Basis Selection

17 0.55554563 44 nips-2004-Conditional Random Fields for Object Recognition

18 0.5545783 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification

19 0.55417007 179 nips-2004-Surface Reconstruction using Learned Shape Models

20 0.55386621 187 nips-2004-The Entire Regularization Path for the Support Vector Machine