nips nips2006 nips2006-97 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. [sent-9, score-0.512]
2 Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. [sent-10, score-0.61]
3 Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. [sent-12, score-0.793]
4 Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity. [sent-15, score-0.537]
5 1 Introduction The central aspect of quantitative approaches in psychology is to adequately model human behaviour. [sent-16, score-0.155]
6 In perceptual research, for example, all successful models of visual perception tacitly assume that at least simple visual stimuli are processed, transformed and compared to some internal reference in a metric space. [sent-17, score-0.483]
7 In cognitive psychology many models of human categorisation, too, assume that stimuli “similar” to each other are grouped together in categories. [sent-18, score-0.209]
8 Within a category similarity is very high whereas between categories similarity is low. [sent-19, score-0.496]
9 This coincides with intuitive notions of categorization which, too, tend to rely on similarity despite serious problems in defining what similarity means or ought to mean [6]. [sent-20, score-0.557]
10 Work on similarity and generalization in psychology has been hugely influenced by the work of Roger Shepard on similarity and categorization [12, 14, 11, 4, 13]. [sent-21, score-0.616]
11 Shepard explicitly assumes that similarity is a distance measure in a metric space, and many perceptual categorization models follow Shepard’s general framework [8, 3]. [sent-22, score-0.636]
12 This notion of similarity is frequently linked to a geometric representation where stimuli are points in a space and the similarity is linked to an intuitive metric on this space, e. [sent-23, score-0.823]
13 They provided convincing evidence that (intuitive, and certainly Euclidean) geometric representations cannot account for human similarity judgements—at least for the highly cognitive and non-perceptual stimuli they employed in their studies. [sent-27, score-0.398]
14 Within their experimental context pairwise dissimilarity measurements violated metricity, in particular symmetry and the triangle inequality. [sent-28, score-0.209]
15 Technically, violations of Euclideanity translate into non positive semi-definite similarity matrices (“pseudo-Gram” matrices) [15], a fact, which imposes severe constraints on the data analysis procedures. [sent-29, score-0.678]
16 Typical approaches to overcome these problems involve leaving out negative eigenvalues altogether or shifting the spectrum for subsequent (Kernel-)PCA analysis [10, 7]. [sent-30, score-0.233]
17 The shortcomings of such methods are that they assume that the data really are Euclidean and that all violations are only due to noise. [sent-31, score-0.378]
18 Shepard’s solution to non-metricity was to find non-linear transformations of the similarity data of the subjects to make them Euclidean, and/or use non-Euclidean metrics such as the city-block metric (or other Minkowski p-norms with p = 2)[11, 4]. [sent-32, score-0.587]
19 Yet another way how metric violations may arise in experimental data—whilst retaining the notion that the internal, mental representation is really metric—is to invoke attentional re-weighting of dimensions during similarity judgements and categorisation tasks [1]. [sent-33, score-1.131]
20 We show how conflictual judgments can introduce metric violation in a situation where the human similarity judgments are based upon smooth geometric features and are otherwise essentially Euclidean. [sent-37, score-0.857]
21 First we present a simple model which explains the occurrence of metric violations in similarity data, with a special focus on human similarity judgments. [sent-38, score-1.155]
22 2 Modeling metric violations for single conflictual situations A dissimilarity function d is called metric if: d(xi , xj ) 0 ∀ xi , xj ∈ X, d(xi , xj ) = 0 iff xi = xj , d(xi , xj ) = d(xj , xi ) ∀ xi , xj ∈ X, d(xi , xk ) + d(xk , xj ) d(xi , xj ) ∀ xi , xj , xk ∈ X. [sent-40, score-1.917]
23 A dissimilarity matrix D = (Dij ) will be called metric if there exists a metric d such that Dij = d(·, ·). [sent-41, score-0.546]
24 D = (Dij ) will be called squared Euclidean if the metric derives from l2 . [sent-42, score-0.215]
25 (i) n A given data point xi can be decomposed in this basis as xi = k=1 αk fk . [sent-57, score-0.147]
26 The squared l2 distance 2 (i) (j) n between xi and xj therefore reads: dij = ||xi − xj ||2 = fk . [sent-58, score-0.366]
27 In the realm of human perception this is not always the case, as illustrated by the following well known ambiguous figure (Fig. [sent-62, score-0.164]
28 If the state-switching could be experimentally induced within a single experiment and subject, this may cause metric or at least Euclidean violations by this conflictual judgment. [sent-65, score-0.621]
29 A possible way to model such conflictual situations in human similarity judgments is to introduce states {ω (1) , ω (2) . [sent-66, score-0.41]
30 The similarity judgment between objects then depends on the perceptual state (weight) the subject is in. [sent-73, score-0.495]
31 Assuming that 2 (i) (j) (l) n the person is in state ω (l) the distance becomes: dij = ||xi −xj ||2 = ωk fk . [sent-74, score-0.165]
32 ω may vary between different subjects reflecting their different focus of attention, thus we will not average the similarity judgments over different subjects but only over different trials of one single subject, assuming that for a given person ω is constant. [sent-76, score-0.592]
33 In order to interpret the metric violations, we propose the following simple algorithm, which allows to specifically visualize the information coded by the negative eigenvalues. [sent-77, score-0.314]
34 If you were to compare this picture to a large set of images of young ladies or old women, the perceptual state-switch could induce large individual weights on the similarity. [sent-80, score-0.127]
35 Retaining the last two (P = {vn , vn−1 }) is a projection onto the last two eigendirections: This corresponds to a projection onto directions related to the negative part of C and containing the information coded by the l2 violations. [sent-85, score-0.457]
36 1 Proof of concept illustration: single conflicts introduce metric violations We now illustrate the model for a single conflictual situation. [sent-87, score-0.625]
37 2 2 2 (i) (j) n Then we have dij = ω lij ek = ω lij xi − xj 2 , where · 2 is the 2 k=1 αk − αk usual unweighted Euclidean norm. [sent-89, score-0.367]
38 Suppose an experimental subject is to pairwise compare these objects to give it a dissimilarity score and that a conflictual situation arises for the pairs (2, 3), (7, 2) and (6, 5) translating in a strong weighting of these dissimilarities. [sent-92, score-0.259]
39 2 shows the projection onto the leading positive and leading negative eigendirections of the both the unweighted distance (top row) and the weighted distance matrix (bottom row). [sent-98, score-0.344]
40 In the negative eigenspace we obtain a singular distribution for the unweighted case. [sent-100, score-0.18]
41 This is not the case for the weighted dissimilarity: we see that the distribution in the negative separates the points whose mutual distance has been (strongly) weighted. [sent-101, score-0.136]
42 The information contained in the negative part, reflecting the information coded by metric or l2 violations, codes in this case for the individual weighting of the (dis)similarities. [sent-102, score-0.342]
43 Last component 15 14 10 12 1613 11 9 7 6 2 845 31 5 11 14 12 915 10 1316 7 48 2 13 6 First component 13 14 11 12 10 16 15 7 4 6 3 5 1 9 2 8 Second last component First component Last component Eigenvalues Second component . [sent-119, score-0.76]
44 7 2 15 16 14 11 12 9 10 13 81 3 4 6 5 Second last component Figure 2: Proof of concept: Unperturbed dissimilarity matrix (no conflict) and weighted dissimilarity matrix (conflict). [sent-135, score-0.417]
45 Single weighting of dissimilarities introduce metric violations and hence l2 violations which reflect in negative spectra. [sent-136, score-1.164]
46 The conflictual points are peripheral in the projection onto the negative eigenspace centered around the bulk of points whose dissimilarities are essentially Euclidean. [sent-137, score-0.487]
47 3 Experiments Twenty gray-scale 256 x 256-pixel images of faces were generated from the MPI-face database 1 . [sent-139, score-0.281]
48 All faces were normalized to have the same mean and standard deviation of pixel intensities, the same area, and were aligned such that the cross-correlation of each face to a mean face of the database was maximal. [sent-140, score-0.515]
49 In the absence of a formal theory of facial similarity we hand-selected a set of faces we thought may show the hypothesised effect: Sixteen of the twenty faces were selected because prior studies had shown them to be consistently and correctly categorised as male or female [18]. [sent-143, score-1.009]
50 Three of the remaining four faces were females that previous subjects found very difficult to categorise and labelled them as female or male almost exactly half of the time. [sent-144, score-0.565]
51 The last face was the mean (androgynous) face across the database. [sent-145, score-0.304]
52 Prior to the pairwise comparisons all subjects viewed all twenty faces simultaneously arranged in a 4 x 5 grid on the experimental monitor. [sent-147, score-0.528]
53 The subjects were asked to inspect the entire set of faces to obtain a general notion of the relative similarity of the faces and they were instructed to use the entire scale in the following rating task. [sent-148, score-1.014]
54 Only thereafter did they proceed to the actual similarity rating stage. [sent-150, score-0.34]
55 Pairwise comparisons of twenty faces requires 20 = 190 trials; each of our four subjects completed four 2 repetitions resulting in a total of 760 trials per subject. [sent-151, score-0.505]
56 During the rating stage faces were shown in pairs in random order for a total duration of 4 seconds (200 msec fade-in, 3600 msec full contrast view, 200 msec fade-out). [sent-152, score-0.468]
57 1 second after the faces had disappeared at the very latest. [sent-155, score-0.281]
58 The final similarity rating per subject was the mean of the four repetitions within a single subject. [sent-157, score-0.392]
59 1 The MPI face database is located at http://faces. [sent-158, score-0.117]
60 de 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 3: Our data set: Faces 1 to 8 are unambiguous males, faces 9 to 16 are unambiguous females. [sent-162, score-0.719]
61 Faces 17 to 19 are ambiguous and have been attributed to either sex in roughly half the cases. [sent-163, score-0.132]
62 The mean luminance of the display was 213 cd/m2 and presentation of the stimuli did not change the mean luminance of the display. [sent-166, score-0.148]
63 Three subjects with normal or corrected-to-normal vision—naive to the purpose of the experiment— acted as observers; they were paid for their participation. [sent-167, score-0.124]
64 In order to exhibit how a single conflictual judgment can break metricity, we follow a two-fold procedure: we first chose a data set of unambiguous faces whose dissimilarities are Euclidean or essential Euclidean. [sent-170, score-0.723]
65 Second, we compare this subsets of faces to a set with those very same unambiguous males and females extended by one additional conflict generating face creating (see Figure 4 for a schematic illustration). [sent-171, score-0.817]
66 Figure 4: The unambiguous females and unambiguous males lead to a pairwise dissimilarity matrix which is essentially Euclidean. [sent-172, score-0.863]
67 The addition of one single conflicting face introduces the l2 violations. [sent-173, score-0.154]
68 Second component We chose a subset of faces which has the property that their mutual dissimilarities are essentially Euclidean (Fig. [sent-186, score-0.588]
69 The conflict generating face is 19 and will be denoted as X. [sent-188, score-0.117]
70 5 shows that the set of unambiguous faces is essentially Euclidean: the smallest eigenvalues of the spectrum are almost zero. [sent-190, score-0.733]
71 This reflects in an almost singular projection in the eigenspace spanned by the eigenvectors associated to the negative eigenvalues. [sent-191, score-0.195]
72 The projection onto the eigenspace spanned by the eigenvectors associated to the positive eigenvalues separates males from females which corresponds to the unique salient feature in the data set. [sent-192, score-0.532]
73 10 376 92 4 51 8 11 Second last component First component Figure 5: Left: Spectrum with only minor l2 violations, Middle: males vs. [sent-193, score-0.427]
74 Right: when a metric is (essentially) Euclidean, the points are concentrated on a singularity in the negative eigenspace. [sent-195, score-0.317]
75 Eigenvalues 9 7 1 3 4 2 11 X 10 8 6 5 First component Last component . [sent-207, score-0.23]
76 This face has been attributed in previous experiences to either sex in 50 % of the cases. [sent-209, score-0.193]
77 This addition causes the spectrum to flip down, hinting at a unambiguous l2 violation, see Fig. [sent-210, score-0.285]
78 Furthermore, it can be verified that the triangle inequality is violated in several instances by addition of this conflicting judgment reflecting that violation indeed is metric in this case. [sent-212, score-0.449]
79 1 4 11 6 98 2 1075 3 X Second last component Figure 6: Left: Spectrum with l2 -violations, Middle: males vs. [sent-213, score-0.312]
80 Right: The conflicting face X is separated from the bulk of faces corresponding to the Euclidean dissimilarities. [sent-215, score-0.482]
81 The positive projection remains almost unchanged and again separates male from female faces with X in between, reflecting its intermediate position between the males and the females. [sent-216, score-0.6]
82 In the negative projection the X can be seen as separating of the bulk of points which are mutually Euclidean. [sent-217, score-0.206]
83 Thus we see that the introduction of a conflicting face within a coherent set of unambiguous faces is the cause of the metric violation. [sent-220, score-0.86]
84 2 Subject 2 and 3 The same procedure was applied to the similarity judgments given by Subject 2 and 3. [sent-222, score-0.344]
85 Since the individual perceptual states are incommensurable between different subjects (the reason why we do not average over subjects but only within a subject) the extracted Euclidean subset were different for each of them. [sent-223, score-0.346]
86 Both for Subject 2 and 3 the X lying between the unambiguous faces reflects outside the bulk of Euclidean points concentrated around the singularity in the negative projections. [sent-226, score-0.686]
87 1 7 1 7 X 9 16 14 11 5 4 1 4 11 9 14 5 7 16 Second last component First component Last component Second component . [sent-243, score-0.53]
88 X 5 11 16 9 14 7 1 4 Second last component First component Eigenvalues . [sent-244, score-0.3]
89 Eigenvalues 4 8 2 7 15 12 13 10 9 11 6 Last component 5 3 8 23 5 7 X 13 15 12 10 4 6 9 11 First component 10 5 6 8 15 3 11 7 9 13 2 12 4 Second last component First component Last component . [sent-258, score-0.645]
90 Second component Figure 7: Subject 2: In the upper row, the subset of faces which whose dissimilarities are Euclidean. [sent-271, score-0.532]
91 The lower row shows the effect of introducing a conflicting face X and the subsequent weighting. [sent-272, score-0.146]
92 X 5 6 10 7 11 8 9 15 3 13 2 4 12 Second last component Figure 8: Subject 3: In the upper row, the subset of faces which whose dissimilarities are essentially Euclidean. [sent-273, score-0.658]
93 The lower row shows the effect of introducing a conflicting face X and the subsequent weighting. [sent-274, score-0.146]
94 Again we obtain that the introduction of a single conflicting face within a set of unambiguous faces for which the human similarity judgment is essentially Euclidean introduces the l2 violations. [sent-275, score-1.138]
95 This strongly corroborates our conflict model and the statement that metric violations in human similarity judgments have a specific meaning, a conflictual judgment for this case. [sent-276, score-1.117]
96 4 Conclusion We presented a simple experiment in which we could show how a single, purposely selected stimulus introduces l2 violations in what appeared to have been an internal Euclidean similarity space of facial attributes. [sent-277, score-0.811]
97 Importantly, thus, it may not be that there is a clear dichotomy in that internal representations of similarity are either metric or not, rather that they may be for “easy” stimuli but “ambiguous” ones can cause metric violations—at least l2 violations in our setting. [sent-278, score-1.299]
98 We have clearly shown that these violations are caused by conflictual points in a data set: the addition of one such point caused the spectra of the Gram matrices to “flip down” reflecting the l2 violation. [sent-279, score-0.378]
99 Further research will involve the acquisition of more pairwise similarity judgements in conflicting situations as well as the refinement of our existing experiments. [sent-280, score-0.438]
100 Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. [sent-363, score-0.124]
wordName wordTfidf (topN-words)
[('violations', 0.378), ('ictual', 0.32), ('faces', 0.281), ('similarity', 0.248), ('unambiguous', 0.219), ('metric', 0.215), ('icting', 0.191), ('euclidean', 0.17), ('judgements', 0.137), ('con', 0.127), ('males', 0.127), ('subjects', 0.124), ('face', 0.117), ('dissimilarity', 0.116), ('component', 0.115), ('judgment', 0.114), ('eigenvalues', 0.111), ('dissimilarities', 0.109), ('shepard', 0.099), ('judgments', 0.096), ('dij', 0.091), ('bulk', 0.084), ('stimuli', 0.084), ('psychological', 0.083), ('violation', 0.08), ('females', 0.073), ('eigenspace', 0.073), ('ict', 0.072), ('xj', 0.072), ('perceptual', 0.071), ('internal', 0.071), ('last', 0.07), ('twenty', 0.07), ('metricity', 0.069), ('ecting', 0.067), ('human', 0.066), ('projection', 0.066), ('spectrum', 0.066), ('subject', 0.062), ('categorization', 0.061), ('inde', 0.061), ('psychometrika', 0.06), ('icts', 0.06), ('dichotomy', 0.06), ('psychology', 0.059), ('xi', 0.057), ('essentially', 0.056), ('negative', 0.056), ('ambiguous', 0.056), ('pairwise', 0.053), ('non', 0.052), ('rating', 0.052), ('unweighted', 0.051), ('illustration', 0.05), ('eigendirections', 0.046), ('potsdam', 0.046), ('sex', 0.046), ('singularity', 0.046), ('female', 0.045), ('msec', 0.045), ('ip', 0.043), ('coded', 0.043), ('retaining', 0.043), ('onto', 0.043), ('mental', 0.042), ('facial', 0.042), ('male', 0.042), ('perception', 0.042), ('distance', 0.041), ('triangle', 0.04), ('categorisation', 0.04), ('exaggerated', 0.04), ('thereafter', 0.04), ('tversky', 0.04), ('separates', 0.039), ('introduces', 0.037), ('uential', 0.036), ('roger', 0.036), ('dis', 0.036), ('stimulus', 0.035), ('lij', 0.034), ('fk', 0.033), ('luminance', 0.032), ('concept', 0.032), ('adequately', 0.03), ('xp', 0.03), ('repetitions', 0.03), ('attributed', 0.03), ('re', 0.03), ('xk', 0.03), ('induce', 0.029), ('psychophysical', 0.029), ('gram', 0.029), ('row', 0.029), ('weighting', 0.028), ('cause', 0.028), ('ek', 0.028), ('notion', 0.028), ('subset', 0.027), ('young', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
2 0.11161844 9 nips-2006-A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
Author: Daniel J. Navarro, Thomas L. Griffiths
Abstract: The additive clustering model is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function of common features. This paper develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides a straightforward way to obtain estimates of both the number of features used in producing similarity judgments and their importance. 1
3 0.10993086 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg
Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.
4 0.10947844 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
5 0.085930079 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
6 0.081969358 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
7 0.081066445 53 nips-2006-Combining causal and similarity-based reasoning
8 0.0750321 110 nips-2006-Learning Dense 3D Correspondence
9 0.074992113 65 nips-2006-Denoising and Dimension Reduction in Feature Space
10 0.069203071 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
11 0.066524148 192 nips-2006-Theory and Dynamics of Perceptual Bistability
12 0.066433281 50 nips-2006-Chained Boosting
13 0.062424328 185 nips-2006-Subordinate class recognition using relational object models
14 0.058140256 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
15 0.057577688 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
16 0.055805568 80 nips-2006-Fundamental Limitations of Spectral Clustering
17 0.055790927 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time
18 0.053327583 86 nips-2006-Graph-Based Visual Saliency
19 0.052960839 58 nips-2006-Context Effects in Category Learning: An Investigation of Four Probabilistic Models
20 0.052176204 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
topicId topicWeight
[(0, -0.168), (1, 0.02), (2, 0.077), (3, 0.009), (4, -0.013), (5, -0.058), (6, -0.035), (7, -0.006), (8, 0.041), (9, -0.069), (10, 0.04), (11, 0.061), (12, 0.056), (13, -0.129), (14, -0.048), (15, -0.16), (16, 0.021), (17, 0.01), (18, -0.165), (19, -0.056), (20, 0.135), (21, 0.122), (22, 0.084), (23, -0.014), (24, 0.082), (25, -0.034), (26, 0.016), (27, -0.027), (28, -0.08), (29, -0.227), (30, 0.024), (31, 0.083), (32, 0.0), (33, 0.035), (34, 0.102), (35, -0.016), (36, -0.217), (37, 0.069), (38, -0.032), (39, 0.124), (40, -0.019), (41, -0.13), (42, 0.158), (43, 0.056), (44, 0.019), (45, -0.052), (46, 0.038), (47, -0.072), (48, -0.073), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.97131521 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
2 0.78701001 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
3 0.56751519 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
4 0.56708002 53 nips-2006-Combining causal and similarity-based reasoning
Author: Charles Kemp, Patrick Shafto, Allison Berke, Joshua B. Tenenbaum
Abstract: Everyday inductive reasoning draws on many kinds of knowledge, including knowledge about relationships between properties and knowledge about relationships between objects. Previous accounts of inductive reasoning generally focus on just one kind of knowledge: models of causal reasoning often focus on relationships between properties, and models of similarity-based reasoning often focus on similarity relationships between objects. We present a Bayesian model of inductive reasoning that incorporates both kinds of knowledge, and show that it accounts well for human inferences about the properties of biological species. 1
5 0.46630222 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg
Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.
6 0.45410657 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
7 0.41333175 110 nips-2006-Learning Dense 3D Correspondence
8 0.40726465 9 nips-2006-A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
9 0.37719405 58 nips-2006-Context Effects in Category Learning: An Investigation of Four Probabilistic Models
10 0.32379016 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time
11 0.31691211 192 nips-2006-Theory and Dynamics of Perceptual Bistability
12 0.31508628 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
13 0.30399656 82 nips-2006-Gaussian and Wishart Hyperkernels
14 0.29182053 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
15 0.29067394 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex
16 0.28834242 174 nips-2006-Similarity by Composition
17 0.28438854 109 nips-2006-Learnability and the doubling dimension
18 0.2820254 120 nips-2006-Learning to Traverse Image Manifolds
19 0.26935148 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
20 0.26420802 49 nips-2006-Causal inference in sensorimotor integration
topicId topicWeight
[(1, 0.089), (3, 0.032), (7, 0.077), (9, 0.046), (12, 0.012), (20, 0.066), (22, 0.073), (41, 0.298), (44, 0.054), (57, 0.073), (65, 0.037), (69, 0.031), (71, 0.012), (90, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.77258688 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
2 0.51888186 161 nips-2006-Particle Filtering for Nonparametric Bayesian Matrix Factorization
Author: Frank Wood, Thomas L. Griffiths
Abstract: Many unsupervised learning problems can be expressed as a form of matrix factorization, reconstructing an observed data matrix as the product of two matrices of latent variables. A standard challenge in solving these problems is determining the dimensionality of the latent matrices. Nonparametric Bayesian matrix factorization is one way of dealing with this challenge, yielding a posterior distribution over possible factorizations of unbounded dimensionality. A drawback to this approach is that posterior estimation is typically done using Gibbs sampling, which can be slow for large problems and when conjugate priors cannot be used. As an alternative, we present a particle filter for posterior estimation in nonparametric Bayesian matrix factorization models. We illustrate this approach with two matrix factorization models and show favorable performance relative to Gibbs sampling.
3 0.51017869 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
4 0.50874162 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf
Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1
5 0.50422239 41 nips-2006-Bayesian Ensemble Learning
Author: Hugh A. Chipman, Edward I. George, Robert E. Mcculloch
Abstract: We develop a Bayesian “sum-of-trees” model, named BART, where each tree is constrained by a prior to be a weak learner. Fitting and inference are accomplished via an iterative backfitting MCMC algorithm. This model is motivated by ensemble methods in general, and boosting algorithms in particular. Like boosting, each weak learner (i.e., each weak tree) contributes a small amount to the overall model. However, our procedure is defined by a statistical model: a prior and a likelihood, while boosting is defined by an algorithm. This model-based approach enables a full and accurate assessment of uncertainty in model predictions, while remaining highly competitive in terms of predictive accuracy. 1
6 0.50175053 158 nips-2006-PG-means: learning the number of clusters in data
7 0.50174719 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
8 0.50155544 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
10 0.49945772 65 nips-2006-Denoising and Dimension Reduction in Feature Space
11 0.49943453 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
12 0.49937674 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
13 0.49814811 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
14 0.49812484 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
15 0.49614787 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
16 0.49614114 79 nips-2006-Fast Iterative Kernel PCA
17 0.49601677 175 nips-2006-Simplifying Mixture Models through Function Approximation
18 0.49514681 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments
19 0.49474764 43 nips-2006-Bayesian Model Scoring in Markov Random Fields
20 0.49464661 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions