nips nips2006 nips2006-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. [sent-8, score-0.479]
2 Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. [sent-9, score-2.055]
3 Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. [sent-10, score-1.115]
4 These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. [sent-12, score-1.171]
5 It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. [sent-13, score-2.018]
6 Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. [sent-15, score-1.279]
7 Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. [sent-16, score-0.309]
8 In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. [sent-17, score-1.158]
9 Such findings give rise to the quest for common factors which determine human facial attractiveness. [sent-18, score-0.504]
10 Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. [sent-19, score-0.469]
11 Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. [sent-20, score-1.647]
12 Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. [sent-21, score-1.67]
13 Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. [sent-22, score-1.529]
14 Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. [sent-23, score-0.808]
15 Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. [sent-24, score-1.123]
16 Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. [sent-27, score-0.543]
17 Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. [sent-28, score-0.405]
18 Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. [sent-29, score-0.785]
19 According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. [sent-30, score-0.422]
20 Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. [sent-31, score-0.954]
21 A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. [sent-35, score-1.187]
22 Different studies have tried to use computational methods in order to analyze facial attractiveness. [sent-36, score-0.386]
23 In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. [sent-40, score-0.603]
24 [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. [sent-42, score-2.098]
25 6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. [sent-44, score-0.542]
26 However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. [sent-45, score-1.526]
27 A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. [sent-46, score-0.501]
28 To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. [sent-49, score-0.522]
29 To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. [sent-50, score-0.708]
30 1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. [sent-52, score-1.532]
31 All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). [sent-56, score-1.466]
32 Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. [sent-58, score-0.832]
33 There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. [sent-59, score-1.207]
34 The final attractiveness rating of each sample was its mean rating across all raters. [sent-61, score-0.912]
35 To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. [sent-62, score-1.158]
36 For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. [sent-63, score-0.822]
37 Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. [sent-71, score-0.974]
38 The facial set contained faces in all ranges of attractiveness. [sent-72, score-0.576]
39 2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e. [sent-79, score-1.717]
40 To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. [sent-82, score-0.423]
41 In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). [sent-83, score-0.469]
42 Several regions are suggested for extracting mean hair color, mean skin color and skin texture. [sent-84, score-0.266]
43 The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. [sent-86, score-0.495]
44 In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. [sent-88, score-0.432]
45 Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. [sent-91, score-0.533]
46 These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. [sent-101, score-1.251]
47 These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. [sent-102, score-0.471]
48 The vector of attractiveness predictions of all images is then compared with the true targets. [sent-114, score-0.724]
49 82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0. [sent-116, score-0.296]
50 The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0. [sent-120, score-0.419]
51 67 and the average correlation between the mean ratings of groups of raters is 0. [sent-121, score-0.537]
52 2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. [sent-128, score-0.755]
53 The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. [sent-129, score-0.489]
54 Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). [sent-133, score-0.9]
55 To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. [sent-134, score-1.167]
56 Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. [sent-136, score-1.318]
57 Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). [sent-137, score-0.313]
58 discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. [sent-145, score-0.311]
59 They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. [sent-146, score-2.421]
60 To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. [sent-149, score-0.285]
61 Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. [sent-151, score-0.49]
62 To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3. [sent-152, score-1.529]
63 1) to compute the attractiveness scores of the resulting composites. [sent-153, score-0.769]
64 In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. [sent-154, score-0.475]
65 Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. [sent-155, score-0.276]
66 As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. [sent-156, score-1.188]
67 Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). [sent-157, score-0.433]
68 Their actual attractiveness scores are reported in Table 1. [sent-158, score-0.769]
69 Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. [sent-160, score-0.334]
70 (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. [sent-175, score-0.334]
71 33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. [sent-186, score-1.177]
72 Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. [sent-187, score-1.404]
73 In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). [sent-188, score-0.482]
74 Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. [sent-189, score-0.358]
75 In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0. [sent-193, score-0.806]
76 The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. [sent-195, score-1.226]
77 It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. [sent-197, score-0.548]
78 A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5. [sent-198, score-0.601]
79 6 while 1,000 composites made of 50 faces got a maximum score of only 5. [sent-199, score-0.449]
80 in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. [sent-202, score-1.091]
81 examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. [sent-204, score-1.929]
82 In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. [sent-205, score-0.315]
83 The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. [sent-209, score-0.563]
84 A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). [sent-210, score-0.271]
85 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. [sent-212, score-1.176]
86 Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. [sent-214, score-1.105]
87 Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. [sent-215, score-0.817]
88 The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. [sent-216, score-0.736]
89 Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. [sent-219, score-1.381]
90 This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. [sent-220, score-0.783]
91 Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. [sent-221, score-0.517]
92 Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. [sent-222, score-0.586]
93 It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its "psychophysics". [sent-223, score-0.748]
94 (2002) What makes a face attractive and why: The role of averageness in defining facial beauty. [sent-268, score-0.605]
95 (1994) Human (Homo sapiens) facial attrativness and sexual selection: The role of symmetry and averageness. [sent-279, score-0.462]
96 (2002) Evolution and individual differences in the perception of attractiveness: How cyclic hormonal changes and self-perceived attractiveness influence female preferences for male faces. [sent-289, score-0.825]
97 (2002) Dimensions of facial physical attractiveness: The intersection of biology and culture. [sent-304, score-0.386]
98 (1999) 3D shape and 2D surface textures of human faces: the role of "averages" in attractiveness and age. [sent-353, score-0.79]
99 (1991) Averaged faces are attractive but very attractive faces are not average. [sent-374, score-0.618]
100 (1995) She is not a beauty even when she smiles: possible evolutionary basis for a relationship between facial attractiveness and hemispheric specialization. [sent-381, score-1.193]
wordName wordTfidf (topN-words)
[('attractiveness', 0.697), ('facial', 0.386), ('ratings', 0.247), ('composites', 0.229), ('raters', 0.198), ('faces', 0.19), ('attractive', 0.119), ('composite', 0.11), ('human', 0.093), ('rating', 0.083), ('averageness', 0.073), ('scores', 0.072), ('asymmetry', 0.072), ('rated', 0.065), ('skin', 0.065), ('beauty', 0.063), ('preferences', 0.062), ('symmetry', 0.055), ('humanlike', 0.052), ('rater', 0.052), ('rhodes', 0.052), ('zebrowitz', 0.052), ('averaged', 0.049), ('correlation', 0.046), ('pearson', 0.046), ('coordinates', 0.046), ('chimeric', 0.042), ('cunningham', 0.042), ('langlois', 0.042), ('roggman', 0.042), ('thornhill', 0.042), ('westport', 0.042), ('psychophysical', 0.04), ('color', 0.04), ('perception', 0.038), ('features', 0.037), ('predictor', 0.037), ('hair', 0.036), ('hsv', 0.036), ('smoothness', 0.033), ('components', 0.032), ('preference', 0.031), ('cfa', 0.031), ('eigenfeatures', 0.031), ('grammer', 0.031), ('perrett', 0.031), ('rubenstein', 0.031), ('zaidel', 0.031), ('mean', 0.03), ('score', 0.03), ('judgments', 0.029), ('female', 0.028), ('percent', 0.028), ('geometric', 0.027), ('images', 0.027), ('face', 0.027), ('evolutionary', 0.026), ('cognition', 0.025), ('females', 0.025), ('findings', 0.025), ('cognitive', 0.024), ('operational', 0.023), ('nc', 0.023), ('machine', 0.022), ('psychology', 0.022), ('biases', 0.022), ('image', 0.021), ('social', 0.021), ('accessibility', 0.021), ('barbee', 0.021), ('dror', 0.021), ('ethology', 0.021), ('fink', 0.021), ('fitness', 0.021), ('galton', 0.021), ('hemispheric', 0.021), ('homo', 0.021), ('manifests', 0.021), ('pleasant', 0.021), ('ruppin', 0.021), ('sapiens', 0.021), ('sexual', 0.021), ('tommer', 0.021), ('workings', 0.021), ('humans', 0.019), ('final', 0.019), ('ct', 0.019), ('researchers', 0.019), ('mate', 0.018), ('morph', 0.018), ('morphing', 0.018), ('offered', 0.018), ('rep', 0.018), ('traits', 0.018), ('projected', 0.017), ('created', 0.017), ('developmental', 0.017), ('qualities', 0.017), ('taste', 0.017), ('groups', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
2 0.14343111 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
Author: Alexis Battle, Gal Chechik, Daphne Koller
Abstract: We present a probabilistic model applied to the fMRI video rating prediction task of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) [2]. Our goal is to predict a time series of subjective, semantic ratings of a movie given functional MRI data acquired during viewing by three subjects. Our method uses conditionally trained Gaussian Markov random fields, which model both the relationships between the subjects’ fMRI voxel measurements and the ratings, as well as the dependencies of the ratings across time steps and between subjects. We also employed non-traditional methods for feature selection and regularization that exploit the spatial structure of voxel activity in the brain. The model displayed good performance in predicting the scored ratings for the three subjects in test data sets, and a variant of this model was the third place entrant to the 2006 PBAIC. 1
3 0.10947844 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
4 0.061246913 110 nips-2006-Learning Dense 3D Correspondence
Author: Florian Steinke, Volker Blanz, Bernhard Schölkopf
Abstract: Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads. 1
5 0.045670107 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Author: Yuanhao Chen, Long Zhu, Alan L. Yuille
Abstract: We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class. In all cases, we validate our results by learning the probability grammars from training datasets and evaluating them on the test datasets. We compare our method to alternative approaches. The advantages of our approach is the speed of inference (under one second), the parsing of the object, and increased accuracy of performance. Moreover, our approach is very general and can be applied to a large range of objects and structures. 1
6 0.041084711 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
7 0.039509676 66 nips-2006-Detecting Humans via Their Pose
8 0.036622714 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
9 0.036439572 185 nips-2006-Subordinate class recognition using relational object models
10 0.035801571 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
11 0.031765919 122 nips-2006-Learning to parse images of articulated bodies
12 0.031159434 50 nips-2006-Chained Boosting
13 0.029615326 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
14 0.028501814 145 nips-2006-Neurophysiological Evidence of Cooperative Mechanisms for Stereo Computation
15 0.027472287 120 nips-2006-Learning to Traverse Image Manifolds
16 0.027015978 149 nips-2006-Nonnegative Sparse PCA
17 0.026990337 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
18 0.026184967 105 nips-2006-Large Margin Component Analysis
19 0.025785878 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
20 0.024784977 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
topicId topicWeight
[(0, -0.083), (1, -0.003), (2, 0.067), (3, -0.038), (4, 0.006), (5, -0.036), (6, -0.032), (7, -0.023), (8, 0.002), (9, -0.04), (10, 0.032), (11, 0.048), (12, 0.018), (13, -0.065), (14, -0.013), (15, -0.08), (16, 0.026), (17, -0.061), (18, -0.093), (19, -0.027), (20, 0.054), (21, 0.069), (22, 0.092), (23, 0.047), (24, 0.092), (25, -0.051), (26, 0.066), (27, -0.067), (28, 0.042), (29, -0.264), (30, 0.007), (31, 0.117), (32, -0.061), (33, 0.01), (34, 0.052), (35, 0.065), (36, -0.141), (37, 0.0), (38, 0.134), (39, 0.017), (40, -0.037), (41, -0.1), (42, 0.07), (43, 0.071), (44, -0.048), (45, -0.062), (46, 0.155), (47, -0.056), (48, -0.099), (49, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.95731062 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
2 0.70238781 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
3 0.70060784 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
Author: Alexis Battle, Gal Chechik, Daphne Koller
Abstract: We present a probabilistic model applied to the fMRI video rating prediction task of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) [2]. Our goal is to predict a time series of subjective, semantic ratings of a movie given functional MRI data acquired during viewing by three subjects. Our method uses conditionally trained Gaussian Markov random fields, which model both the relationships between the subjects’ fMRI voxel measurements and the ratings, as well as the dependencies of the ratings across time steps and between subjects. We also employed non-traditional methods for feature selection and regularization that exploit the spatial structure of voxel activity in the brain. The model displayed good performance in predicting the scored ratings for the three subjects in test data sets, and a variant of this model was the third place entrant to the 2006 PBAIC. 1
4 0.38679671 53 nips-2006-Combining causal and similarity-based reasoning
Author: Charles Kemp, Patrick Shafto, Allison Berke, Joshua B. Tenenbaum
Abstract: Everyday inductive reasoning draws on many kinds of knowledge, including knowledge about relationships between properties and knowledge about relationships between objects. Previous accounts of inductive reasoning generally focus on just one kind of knowledge: models of causal reasoning often focus on relationships between properties, and models of similarity-based reasoning often focus on similarity relationships between objects. We present a Bayesian model of inductive reasoning that incorporates both kinds of knowledge, and show that it accounts well for human inferences about the properties of biological species. 1
5 0.34626701 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
6 0.30098853 110 nips-2006-Learning Dense 3D Correspondence
7 0.25757468 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
8 0.25653273 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
9 0.24550344 113 nips-2006-Learning Structural Equation Models for fMRI
10 0.24457532 120 nips-2006-Learning to Traverse Image Manifolds
11 0.24244782 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
12 0.23830938 192 nips-2006-Theory and Dynamics of Perceptual Bistability
13 0.21986379 174 nips-2006-Similarity by Composition
14 0.21527605 52 nips-2006-Clustering appearance and shape by learning jigsaws
15 0.2118219 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
16 0.20073776 122 nips-2006-Learning to parse images of articulated bodies
17 0.18916978 124 nips-2006-Linearly-solvable Markov decision problems
18 0.18505694 185 nips-2006-Subordinate class recognition using relational object models
19 0.18424591 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight
20 0.18308036 180 nips-2006-Speakers optimize information density through syntactic reduction
topicId topicWeight
[(1, 0.069), (3, 0.02), (7, 0.077), (9, 0.037), (20, 0.025), (22, 0.027), (44, 0.031), (57, 0.056), (60, 0.412), (65, 0.038), (69, 0.027), (71, 0.035), (90, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.80088425 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
Author: Amit Kagian, Gideon Dror, Tommer Leyvand, Daniel Cohen-or, Eytan Ruppin
Abstract: This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general. 1 I n trod u cti on Philosophers, artists and scientists have been trying to capture the nature of beauty since the early days of philosophy. Although in modern days a common layman's notion is that judgments of beauty are a matter of subjective opinion, recent findings suggest that people might share a common taste for facial attractiveness and that their preferences may be an innate part of the primary constitution of our nature. Several experiments have shown that 2 to 8 months old infants prefer looking at faces which adults rate as being more attractive [1]. In addition, attractiveness ratings show very high agreement between groups of raters belonging to the same culture and even across cultures [2]. Such findings give rise to the quest for common factors which determine human facial attractiveness. Accordingly, various hypotheses, from cognitive, evolutional and social perspectives, have been put forward to describe the common preferences for facial beauty. Inspired by Sir Francis Galton’s photographic method of composing faces [3], Rubenstein, Langlois and Roggman created averaged faces by morphing multiple images together and proposed that averageness is the answer for facial attractiveness [4, 5]. Human judges found these averaged faces to be attractive and rated them with attractiveness ratings higher than the mean rating of the component faces composing them. Grammer and Thornhill have investigated symmetry and averageness of faces and concluded that symmetry was more important than averageness in facial attractiveness [6]. Little and colleagues have agreed that average faces are attractive but claim that faces with certain extreme features, such as extreme sexually dimorphic traits, may be more attractive than average faces [7]. Other researchers have suggested various conditions which may contribute to facial attractiveness such as neonate features, pleasant expressions and familiarity. Cunningham and his associates suggest a multiple fitness model in which there is no single constructing line that determines attractiveness. Instead, different categories of features signal different desirable qualities of the perceived target [8]. Even so, the multiple fitness model agrees that some facial qualities are universally physically attractive to people. Apart from eliciting the facial characteristics which account for attractiveness, modern researchers try to describe underlying mechanisms for these preferences. Many contributors refer to the evolutionary origins of attractiveness preferences [9]-[11]. According to this view, facial traits signal mate quality and imply chances for reproductive success and parasite resistance. Some evolutionary theorists suggest that preferred features might not signal mate quality but that the “good taste” by itself is an evolutionary adaptation (individuals with a preference for attractiveness will have attractive offspring that will be favored as mates) [9]. Another mechanism explains attractiveness' preferences through a cognitive theory - a preference for attractive faces might be induced as a by-product of general perception or recognition mechanisms [5, 12]: Attractive faces might be pleasant to look at since they are closer to the cognitive representation of the face category in the mind. These cognitive representations are described as a part of a cognitive mechanism that abstracts prototypes from distinct classes of objects. These prototypes relate to average faces when considering the averageness hypothesis. A third view has suggested that facial attractiveness originates in a social mechanism, where preferences may be dependent on the learning history of the individual and even on his social goals [12]. Different studies have tried to use computational methods in order to analyze facial attractiveness. Averaging faces with morph tools was done in several cases (e.g. [5, 13]). In [14], laser scans of faces were put into complete correspondence with the average face in order to examine the relationship between facial attractiveness, age, and averageness. Another approach was used in [15] where a genetic algorithm, guided by interactive user selections, was programmed to evolve a “most beautiful” female face. [16] used machine learning methods to investigate whether a machine can predict attractiveness ratings by learning a mapping from facial images to their attractiveness scores. Their predictor achieved a significant correlation of 0.6 with average human ratings, demonstrating that facial beauty can be learned by a machine, at least to some degree. However, as human raters still do significantly outperform the predictor of [16], the challenge of constructing a facial attractiveness machine with human level evaluation accuracy has remained open. A primary goal of this study is to surpass these results by developing a machine which obtains human level performance in predicting facial attractiveness. Having accomplished this, our second main goal is to conduct a series of simulated psychophysical experiments and study the resemblance between human and machine judgments. This latter task carries two potential rewards: A. To determine whether the machine can aid in understanding the psychophysics of human facial attractiveness, capitalizing on the ready accessibility of the analysis of its inner workings, and B. To study whether learning an explicit operational ratings prediction task also entails learning implicit humanlike biases, at least for the case of facial attractiveness. 2 2.1 T h e f aci al trai n in g d atab as e: Acq u i s i ti on , p rep roces s i n g an d rep res en tati on Rating facial attractiveness The chosen database was composed of 91 facial images of American females, taken by the Japanese photographer Akira Gomi. All 91 samples were frontal color photographs of young Caucasian females with a neutral expression. All samples were of similar age, skin color and gender. The subjects’ portraits had no accessories or other distracting items such as jewelry. All 91 facial images in the dataset were rated for attractiveness by 28 human raters (15 males, 13 females) on a 7-point Likert scale (1 = very unattractive, 7 = very attractive). Ratings were collected with a specifically designed html interface. Each rater was asked to view the entire set before rating in order to acquire a notion of attractiveness scale. There was no time limit for judging the attractiveness of each sample and raters could go back and adjust the ratings of already rated samples. The images were presented to each rater in a random order and each image was presented on a separate page. The final attractiveness rating of each sample was its mean rating across all raters. To validate that the number of ratings collected adequately represented the ``collective attractiveness rating'' we randomly divided the raters into two disjoint groups of equal size. For each facial image, we calculated the mean rating on each group, and calculated the Pearson correlation between the mean ratings of the two groups. This process was repeated 1,000 times. The mean correlation between two groups was 0.92 ( = 0.01). This corresponds well to the known level of consistency among groups of raters reported in the literature (e.g. [2]). Hence, the mean ratings collected are stable indicators of attractiveness that can be used for the learning task. The facial set contained faces in all ranges of attractiveness. Final attractiveness ratings range from 1.42 to 5.75 and the mean rating was 3.33 ( = 0.94). 2.2 Data preprocessing and representation Preliminary experimentation with various ways of representing a facial image have systematically shown that features based on measured proportions, distances and angles of faces are most effective in capturing the notion of facial attractiveness (e.g. [16]). To extract facial features we developed an automatic engine that is capable of identifying eyes, nose, lips, eyebrows, and head contour. In total, we measured 84 coordinates describing the locations of those facial features (Figure 1). Several regions are suggested for extracting mean hair color, mean skin color and skin texture. The feature extraction process was basically automatic but some coordinates needed to be manually adjusted in some of the images. The facial coordinates are used to create a distances-vector of all 3,486 distances between all pairs of coordinates in the complete graph created by all coordinates. For each image, all distances are normalized by face length. In a similar manner, a slopes-vector of all the 3,486 slopes of the lines connecting the facial coordinates is computed. Central fluctuating asymmetry (CFA), which is described in [6], is calculated from the coordinates as well. The application also provides, for each face, Hue, Saturation and Value (HSV) values of hair color and skin color, and a measurement of skin smoothness. Figure 1: Facial coordinates with hair and skin sample regions as represented by the facial feature extractor. Coordinates are used for calculating geometric features and asymmetry. Sample regions are used for calculating color values and smoothness. The sample image, used for illustration only, is of T.G. and is presented with her full consent. Combining the distances-vector and the slopes-vector yields a vector representation of 6,972 geometric features for each image. Since strong correlations are expected among the features in such representation, principal component analysis (PCA) was applied to these geometric features, producing 90 principal components which span the sub-space defined by the 91 image vector representations. The geometric features are projected on those 90 principal components and supply 90 orthogonal eigenfeatures representing the geometric features. Eight measured features were not included in the PCA analysis, including CFA, smoothness, hair color coordinates (HSV) and skin color coordinates. These features are assumed to be directly connected to human perception of facial attractiveness and are hence kept at their original values. These 8 features were added to the 90 geometric eigenfeatures, resulting in a total of 98 image-features representing each facial image in the dataset. 3 3.1 E xp eri men ts an d resu l ts Predictor construction and validation We experimented with several induction algorithms including simple Linear Regression, Least Squares Support Vector Machine (LS-SVM) (both linear as well as non-linear) and Gaussian Processes (GP). However, as the LS-SVM and GP showed no substantial advantage over Linear Regression, the latter was used and is presented in the sequel. A key ingredient in our methods is to use a proper image-features selection strategy. To this end we used subset feature selection, implemented by ranking the image-features by their Pearson correlation with the target. Other ranking functions produced no substantial gain. To measure the performance of our method we removed one sample from the whole dataset. This sample served as a test set. We found, for each left out sample, the optimal number of image-features by performing leave-one-out-cross-validation (LOOCV) on the remaining samples and selecting the number of features that minimizes the absolute difference between the algorithm's output and the targets of the training set. In other words, the score for a test example was predicted using a single model based on the training set only. This process was repeated n=91 times, once for each image sample. The vector of attractiveness predictions of all images is then compared with the true targets. These scores are found to be in a high Pearson correlation of 0.82 with the mean ratings of humans (P-value < 10 -23), which corresponds to a normalized Mean Squared Error of 0.39. This accuracy is a marked improvement over the recently published performance results of a Pearson correlation of 0.6 on a similar dataset [16]. The average correlation of an individual human rater to the mean correlations of all other raters in our dataset is 0.67 and the average correlation between the mean ratings of groups of raters is 0.92 (section 2.1). It should be noted that we tried to use this feature selection and training procedure with the original geometric features instead of the eigenfeatures, ranking them by their correlation to the targets and selecting up to 300 best ranked features. This, however, has failed to produce good predictors due to strong correlations between the original geometric features (maximal Pearson correlation obtained was 0.26). 3.2 S i m i l a r i t y o f ma c h i n e a n d h u m a n j u d g m e n t s Each rater (human and machine) has a 91 dimensional rating vector describing its Figure 2: Distribution of mean Euclidean distance from each human rater to all other raters in the ratings space. The machine’s average distance form all other raters (left bar) is smaller than the average distance of each of the human raters to all others. attractiveness ratings of all 91 images. These vectors can be embedded in a 91 dimensional ratings space. The Euclidian distance between all raters (human and machine) in this space was computed. Compared with each of the human raters, the ratings of the machine were the closest, on average, to the ratings of all other human raters (Figure 2). To verify that the machine ratings are not outliers that fall out of clusters of human raters (even though their mean distance from the other ratings is small) we surrounded each of the rating vectors in the ratings space with multidimensional spheres of several radius sizes. The machine had more human neighbors than the mean number of neighbors of human raters, testifying that it does not fall between clusters. Finally, for a graphic display of machine ratings among human ratings we applied PCA to machine and human ratings in the rating space and projected all ratings onto the resulting first 2 and 3 principal components. Indeed, the machine is well placed in a mid-zone of human raters (Figure 3). 5 Machine 0 Machine 0 -4 -8 -5 7 5 0 -10 -10 -5 0 5 10 (a) 0 -7 -5 (b) Figure 3: Location of machine ratings among the 28 human ratings: Ratings were projected into 2 dimensions (a) and 3 dimensions (b) by performing PCA on all ratings and projecting them on the first principal components. The projected data explain 29.8% of the variance in (a) and 36.6% in (b). 3.3 Psychophysical experiments in silico A number of simulated psychophysical experiments reveal humanlike biases of the machine's performance. Rubenstein et al. discuss a morphing technique to create mathematically averaged faces from multiple face images [5]. They reported that averaged faces made of 16 and 32 original component images were rated higher in attractiveness than the mean attractiveness ratings of their component faces and higher than composites consisting of fewer faces. In their experiment, 32-component composites were found to be the most attractive. We used a similar technique to create averaged virtually-morphed faces with various numbers of components, nc, and have let the machine predict their attractiveness. To this end, coordinate values of the original component faces were averaged to create a new set of coordinates for the composite. These coordinates were used to calculate the geometrical features and CFA of the averaged face. Smoothness and HSV values for the composite faces were calculated by averaging the corresponding values of the component faces 1. To study the effect of nc on the attractiveness score we produced 1,000 virtual morph images for each value of n c between 2 and 50, and used our attractiveness predictor (section 3.1) to compute the attractiveness scores of the resulting composites. In accordance with the experimental results of [5], the machine manifests a humanlike bias for higher scores of averaged composites over their components’ mean score. Figure 4a, presenting these results, shows the percent of components which were rated as less attractive than their corresponding composite, for each number of components n c. As evident, the attractiveness rating of a composite surpasses a larger percent of its components’ ratings as nc increases. Figure 4a also shows the mean scores of 1,000 1 HSV values are converted to RGB before averaging composites and the mean scores of their components, for each n c (scores are normalized to the range [0, 1]). Their actual attractiveness scores are reported in Table 1. As expected, the mean scores of the components images are independent of n c, while composites’ scores increase with nc. Mean values of smoothness and asymmetry of the composites are presented in Figure 4b. 0.4 Smoothness Asymmetry 0.8 0.2 0.75 0 -0.2 0.7 -0.4 0.65 -0.6 0.6 Fraction of less attractive components Composite's score (normalized) Components' mean score (normalized) 0.55 2 10 20 30 40 -0.8 50 -1 2 Number of components in composite 10 20 30 40 50 Number of components in composite (a) (b) Figure 4: Mean results over 1,000 composites made of varying numbers of image components: (a) Percent of components which were rated as less attractive than their corresponding composite accompanied with mean scores of composites and the mean scores of their components (scores are normalized to the range [0, 1]. actual attractiveness scores are reported in Table 1). (b) Mean values of smoothness and asymmetry of 1,000 composites for each number of components, nc. Table 1: Mean results over 1,000 composites made of varying numbers of component images NUMBER OF COMPONENTS IN COMPOSITE COMPOSITE SCORE COMPONENTS MEAN SCORE 2 4 12 25 50 3.46 3.66 3.74 3.82 3.94 3.34 3.33 3.32 3.32 3.33 COMPONENTS RATED LOWER THAN COMPOSITE (PERCENT) 55 64 70 75 81 % % % % % Recent studies have provided evidence that skin texture influences judgments of facial attractiveness [17]. Since blurring and smoothing of faces occur when faces are averaged together [5], the smooth complexion of composites may underlie the attractiveness of averaged composites. In our experiment, a preference for averageness is found even though our method of virtual-morphing does not produce the smoothening effect and the mean smoothness value of composites corresponds to the mean smoothness value in the original dataset, for all nc (see Figure 4b). Researchers have also suggested that averaged faces are attractive since they are exceptionally symmetric [18]. Figure 4b shows that the mean level of asymmetry is indeed highly correlated with the mean scores of the morphs (Pearson correlation of -0.91, P-value < 10 -19). However, examining the correlation between the rest of the features and the composites' scores reveals that this high correlation is not at all unique to asymmetry. In fact, 45 of the 98 features are strongly correlated with attractiveness scores (|Pearson correlation| > 0.9). The high correlation between these numerous features and attractiveness scores of averaged faces indicates that symmetry level is not an exceptional factor in the machine’s preference for averaged faces. Instead, it suggests that averaging causes many features, including both geometric features and symmetry, to change in a direction which causes an increase in attractiveness. It has been argued that although averaged faces are found to be attractive, very attractive faces are not average [18]. A virtual composite made of the 12 most attractive faces in the set (as rated by humans) was rated by the machine with a high score of 5.6 while 1,000 composites made of 50 faces got a maximum score of only 5.3. This type of preference resembles the findings of an experiment by Perrett et al. in which a highly attractive composite, morphed from only attractive faces, was preferred by humans over a composite made of 60 images of all levels of attractiveness [13]. Another study by Zaidel et al. examined the asymmetry of attractiveness perception and offered a relationship between facial attractiveness and hemispheric specialization [19]. In this research right-right and left-left chimeric composites were created by attaching each half of the face to its mirror image. Subjects were asked to look at left-left and right-right composites of the same image and judge which one is more attractive. For women’s faces, right-right composites got twice as many ‘more attractive’ responses than left-left composites. Interestingly, similar results were found when simulating the same experiment with the machine: Right-right and left-left chimeric composites were created from the extracted coordinates of each image and the machine was used to predict their attractiveness ratings (taking care to exclude the original image used for the chimeric composition from the training set, as it contains many features which are identical to those of the composite). The machine gave 63 out of 91 right-right composites a higher rating than their matching left-left composite, while only 28 left-left composites were judged as more attractive. A paired t-test shows these results to be statistically significant with P-value < 10 -7 (scores of chimeric composites are normally distributed). It is interesting to see that the machine manifests the same kind of asymmetry bias reported by Zaidel et al, though it has never been explicitly trained for that. 4 Di s cu s s i on In this work we produced a high quality training set for learning facial attractiveness of human faces. Using supervised learning methodologies we were able to construct the first predictor that achieves accurate, humanlike performance for this task. Our results add the task of facial attractiveness prediction to a collection of abstract tasks that has been successfully accomplished with current machine learning techniques. Examining the machine and human raters' representations in the ratings space identifies the ratings of the machine in the center of human raters, and closest, in average, to other human raters. The similarity between human and machine preferences has prompted us to further study the machine’s operation in order to capitalize on the accessibility of its inner workings and learn more about human perception of facial attractiveness. To this end, we have found that that the machine favors averaged faces made of several component faces. While this preference is known to be common to humans as well, researchers have previously offered different reasons for favoring averageness. Our analysis has revealed that symmetry is strongly related to the attractiveness of averaged faces, but is definitely not the only factor in the equation since about half of the image-features relate to the ratings of averaged composites in a similar manner as the symmetry measure. This suggests that a general movement of features toward attractiveness, rather than a simple increase in symmetry, is responsible for the attractiveness of averaged faces. Obviously, strictly speaking this can be held true only for the machine, but, in due of the remarkable ``humnalike'' behavior of the machine, it also brings important support to the idea that this finding may well extend also to human perception of facial attractiveness. Overall, it is quite surprising and pleasing to see that a machine trained explicitly to capture an operational performance criteria such as rating, implicitly captures basic human psychophysical biases related to facial attractiveness. It is likely that while the machine learns the ratings in an explicit supervised manner, it also concomitantly and implicitly learns other basic characteristics of human facial ratings, as revealed by studying its
2 0.52197421 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
Author: Zhenyue Zhang, Jing Wang
Abstract: The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE. 1
3 0.31529692 80 nips-2006-Fundamental Limitations of Spectral Clustering
Author: Boaz Nadler, Meirav Galun
Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1
4 0.31071731 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons
Author: Thomas Voegtlin
Abstract: In biological neurons, the timing of a spike depends on the timing of synaptic currents, in a way that is classically described by the Phase Response Curve. This has implications for temporal coding: an action potential that arrives on a synapse has an implicit meaning, that depends on the position of the postsynaptic neuron on the firing cycle. Here we show that this implicit code can be used to perform computations. Using theta neurons, we derive a spike-timing dependent learning rule from an error criterion. We demonstrate how to train an auto-encoder neural network using this rule. 1
5 0.31004709 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
6 0.30920294 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
7 0.30789542 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
8 0.30782178 167 nips-2006-Recursive ICA
9 0.30737063 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
10 0.30715427 43 nips-2006-Bayesian Model Scoring in Markov Random Fields
11 0.3059918 154 nips-2006-Optimal Change-Detection and Spiking Neurons
12 0.30585405 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
13 0.30568483 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
14 0.30451438 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
15 0.30439165 158 nips-2006-PG-means: learning the number of clusters in data
16 0.30435658 65 nips-2006-Denoising and Dimension Reduction in Feature Space
17 0.30402341 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
18 0.30384973 110 nips-2006-Learning Dense 3D Correspondence
19 0.30311409 117 nips-2006-Learning on Graph with Laplacian Regularization
20 0.30307212 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons