nips nips2007 nips2007-155 knowledge-graph by maker-knowledge-mining

155 nips-2007-Predicting human gaze using low-level saliency combined with face detection


Source: pdf

Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch

Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Predicting human gaze using low-level saliency combined with face detection Jonathan Harel Electrical Engineering California Institute of Technology Pasadena, CA 91125 harel@klab. [sent-1, score-0.973]

2 edu Abstract Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. [sent-10, score-0.227]

3 We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. [sent-15, score-0.916]

4 Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. [sent-16, score-0.74]

5 Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. [sent-17, score-0.826]

6 One accessible correlate of human attention is the fixation pattern in scanpaths [1], which has long been of interest to the vision community [2]. [sent-19, score-0.226]

7 Such a bottom-up saliency model works well when higher order semantics are reflected in low-level features (as is often the case for isolated objects, and even for reasonably cluttered scenes), but tends to fail if other factors dominate: e. [sent-25, score-0.456]

8 , in search tasks [7, 8], strong contextual effects [9], or in free-viewing of images without clearly isolated objects, such as 1 forest scenes or foliage [10]. [sent-27, score-0.231]

9 Here, we test how images containing faces - ecologically highly relevant objects - influence variability of scanpaths across subjects. [sent-28, score-0.549]

10 In a second step, we improve the standard saliency model by adding a “face channel” based on an established face detector algorithm. [sent-29, score-0.857]

11 Although there is an ongoing debate regarding the exact mechanisms which underlie face detection, there is no argument that a normal subject (in contrast to autistic patients) will not interpret a face purely as a reddish blob with four lines, but as a much more significant entity ([11, 12]. [sent-30, score-0.703]

12 In fact, there is mounting evidence of infants’ preference for face-like patterns before they can even consciously perceive the category of faces [13], which is crucial for emotion and social processing ([13, 14, 15, 16]). [sent-31, score-0.315]

13 There are numerous computer-vision models for face detection with good results ([17, 18, 19, 20]). [sent-33, score-0.441]

14 One widely used model for face recognition is the Viola & Jones [21] feature-based template matching algorithm (VJ). [sent-34, score-0.348]

15 There have been previous attempts to incorporate face detection into a saliency model. [sent-35, score-0.874]

16 However, they have either relied on biasing a color channel toward skin hue [22] - and thus being ineffective in many cases nor being face-selective per se - or they have suffered from lack of generality [23]. [sent-36, score-0.136]

17 We here propose a system which combines the bottom-up saliency map model of Itti et al. [sent-37, score-0.533]

18 The contributions of this study are: (1) Experimental data showing that subjects exhibit significantly less variable scanpaths when viewing natural images containing faces, marked by a strong tendency to fixate on faces early. [sent-39, score-0.711]

19 (2) A novel saliency model which combines a face detector with intensity, color, and orientation information. [sent-40, score-0.899]

20 (3) Quantitative results on two versions of this saliency model, including one extended from a recent graph-based approach, which show that, compared to previous approaches, it better predicts subjects’ fixations on images with faces, and predicts as well otherwise. [sent-41, score-0.616]

21 1 Methods Experimental procedures Seven subjects viewed a set of 250 images (1024 × 768 pixels) in a three phase experiment. [sent-43, score-0.315]

22 200 of the images included frontal faces of various people; 50 images contained no faces but were otherwise identical, allowing a comparison of viewing a particular scene with and without a face. [sent-44, score-0.963]

23 In the first (“free-viewing”) phase of the experiment, 200 of these images (the same subset for each subject) were presented to subjects for 2 s, after which they were instructed to answer “How interesting was the image? [sent-45, score-0.344]

24 In the second (“search”) phase, subjects viewed another 200 image subset in the same setup, only this time they were initially presented with a probe image (either a face, or an object in the scene: banana, cell phone, toy car, etc. [sent-48, score-0.344]

25 ) for 600 ms after which one of the 200 images appeared for 2 s. [sent-49, score-0.179]

26 We used the second task to test if there are any differences in the fixation orders and viewing patterns between freeviewing and task-dependent viewing of images with faces. [sent-54, score-0.303]

27 In the third phase, subjects performed a 100 images recognition memory task where they had to answer with y/n whether they had seen the image before. [sent-55, score-0.381]

28 50 of the images were taken from the experimental set and 50 were new. [sent-56, score-0.16]

29 The images were introduced as “regular images that one can expect to find in an everyday personal photo album”. [sent-59, score-0.32]

30 Scenes were indoors and outdoors still images (see examples in Fig. [sent-60, score-0.16]

31 Images included faces in various skin colors, age groups, and positions (no image had the face at the center as this was the starting fixation location in all trials). [sent-62, score-0.691]

32 A few images had face-like objects (see balloon in Fig. [sent-63, score-0.186]

33 1, panel 3), animal faces, and objects that had irregular faces in them (masks, the Egyptian sphinx face, etc. [sent-64, score-0.291]

34 ) of the entire image - between 1◦ to 5◦ of the visual field; we also varied the number of faces in the image between 1-6, with a mean of 1. [sent-69, score-0.47]

35 Image order was randomized throughout, and subjects were na¨ve to the purpose of the experiment. [sent-72, score-0.127]

36 The images were presented on a CRT 2 screen (120 Hz), using Matlab’s Psychophysics and eyelink toolbox extensions ([25, 26]). [sent-75, score-0.241]

37 The distance between the screen and the subject was 80 cm, giving a total visual angle for each image of 28◦ × 21◦ . [sent-77, score-0.178]

38 The trend of visiting the faces first - typically within the 1st or 2nd fixation - is evident. [sent-86, score-0.29]

39 2 Combining face detection with various saliency algorithms We tried to predict the attentional allocation via fixation patterns of the subjects using various saliency maps. [sent-92, score-1.551]

40 Each saliency map was represented as a positive valued heat map over the image plane. [sent-94, score-0.684]

41 This promotes feature maps with one conspicuous location to the detriment of maps presenting numerous conspicuous locations. [sent-97, score-0.177]

42 The graphbased saliency map model (GBSM) employs spectral techniques in lieu of center surround subtraction and “Maxnorm” normalization, using only local computations. [sent-98, score-0.533]

43 For face detection, we used the Intel Open Source Computer Vision Library (“OpenCV”) [28] implementation of [21]. [sent-100, score-0.328]

44 This implementation rapidly processes images while achieving high detection rates. [sent-101, score-0.25]

45 We used it to form a “Faces conspicuity map”, or “Face channel” 3 by convolving delta functions at the (x,y) detected facial centers with 2D Gaussians having standard deviation equal to estimated facial radius. [sent-109, score-0.142]

46 Face detection Color Intensity Orientation False  False positive Saliency Map with face  detection Saliency Map Figure 2: Modified saliency model. [sent-113, score-0.456]

47 An image is processed through standard [5] color, orientation and intensity multi-scale channels, as well as through a trained template-matching face detection mechanism. [sent-114, score-0.577]

48 Face coordinates and radius from the face detector are used to form a face conspicuity map (F), with peaks at facial centers. [sent-115, score-0.912]

49 All four maps are normalized to the same dynamic range, and added with equal weights to a final saliency map (SM+VJ, or GBSM+VJ). [sent-116, score-0.586]

50 This is compared to a saliency map which only uses the three bottom-up features maps (SM or GBSM). [sent-117, score-0.586]

51 1 Results Psychophysical results To evaluate the results of the 7 subjects’ viewing of the images, we manually defined minimally sized rectangular regions-of-interest (ROIs) around each face in the entire image collection. [sent-119, score-0.463]

52 In 972 out of the 1050 (7 subjects x 150 images with faces) trials (92. [sent-121, score-0.334]

53 4%) trials, a 4 face was fixated on within the first fixation, and of the remaining 405 trials, a face was fixated on in the second fixation in 71. [sent-124, score-0.656]

54 Given that the face ROIs were chosen very conservatively (i. [sent-130, score-0.328]

55 fixations just next to a face do not count as fixations on the face), this shows that faces, if present, are typically fixated on within the first two fixations (327 ms ± 95 ms on average). [sent-132, score-0.366]

56 Furthermore, in addition to finding early fixations on faces, we found that inter-subject scanpath consistency on images with faces was higher. [sent-133, score-0.499]

57 24 pixels on images without faces (different with p < 10−6 ). [sent-136, score-0.457]

58 The null hypothesis that we would see the same fraction of first fixations on a face at random is rejected at p < 10−20 (t-test). [sent-140, score-0.353]

59 To test for the hypothesis that face saliency is not due to top-down preference for faces in the absence of other interesting things, we examined the results of the “search” task, in which subjects were presented with a non-face target probe in 50% of the trials. [sent-141, score-1.22]

60 Provided the short amount of time for the search (2 s), subjects should have attempted to tune their internal saliency weights to adjust color, intensity, and orientation optimally for the searched target [30]. [sent-142, score-0.656]

61 Nevertheless, subjects still tended to fixate on the faces early. [sent-143, score-0.392]

62 A face was fixated on within the first fixation in 24% of trials, within the first two fixations in 52% of trials, and within the three fixations in 77% of the trials. [sent-144, score-0.328]

63 Overall, we found that in both experimental conditions (“free-viewing” and “search”), faces were powerful attractors of attention, accounting for a strong majority of early fixations when present. [sent-147, score-0.29]

64 This trend allowed us to easily improve standard saliency models, as discussed below. [sent-148, score-0.481]

65 Figure 3: Extent of fixation on face regions-of-interest (ROIs) during the “free-viewing” phase . [sent-149, score-0.356]

66 Right: Bars depict percentage of trials, which reach a face the first time in the first, second, third, . [sent-152, score-0.356]

67 the fraction of trials in which faces were fixated on at least once up to and including the nth fixation. [sent-158, score-0.337]

68 2 Assessing the saliency map models We ran VJ on each of the 200 images used in the free viewing task, and found at least one face detection on 176 of these images, 148 of which actually contained faces (only two images with faces were missed). [sent-160, score-1.882]

69 For each of these 176 images, we computed four saliency maps (SM, GBSM, SM+VJ, GBSM+VJ) as discussed above, and quantified the compatibility of each with our scanpath recordings, in particular fixations, using the area under an ROC curve. [sent-161, score-0.558]

70 The ROC curves were generated by sweeping over saliency value thresholds, and treating the fraction of non-fixated pixels 5 on a map above threshold as false alarms, and the fraction of fixated pixels above threshold as hits [29, 31]. [sent-162, score-0.687]

71 4, all models predict above chance (50%): SM performs worst, and GBSM+VJ best, since including the face detector substantially improves performance in both cases. [sent-164, score-0.42]

72 Subjects’ scanpaths shown on the left panels of figure 1). [sent-166, score-0.125]

73 Top panel: image with the 49 fixations of the 7 subjects (red). [sent-167, score-0.201]

74 From left to right, saliency map model of Itti et al. [sent-169, score-0.533]

75 (SM), saliency map with the VJ face detection map (SM+VJ), the graph-based saliency map (GBSM), and the graph-based saliency map with face detection channel (GBSM+VJ). [sent-170, score-2.582]

76 5): first, all models perform better than chance, even over the 28 images without faces. [sent-175, score-0.16]

77 For the 148/176 images with faces, SM+VJ was better than SM alone for 144/148 images (p < 10−29 ), whereas VJ alone (equal to the face conspicuity map) was better than SM alone for 83/148 images, a fraction that fails to reach significance. [sent-180, score-0.743]

78 Thus, although the face conspicuity map was surprisingly predictive on its own, fixation predictions were much better when it was combined with the full saliency model. [sent-181, score-0.953]

79 For the 28 images without faces, SM (better than SM+VJ for 18) and SM+VJ (better than SM for 10) did not show a significant difference, nor did GBSM vs. [sent-182, score-0.16]

80 However, in a recent follow-up study with more non-face images, we found preliminary results indicating that the mean ROC score of VJ-enhanced saliency maps is higher on such non-face images, although the median is slightly lower, i. [sent-184, score-0.509]

81 performance is much improved when improved at all indicating that VJ false positives can sometimes enhance saliency maps. [sent-186, score-0.521]

82 In summary, we found that adding a face detector channel improves fixation prediction in images with faces dramatically, while it does not impair prediction in images without faces, even though the face detector has false alarms in those cases. [sent-187, score-1.521]

83 4 Discussion First, we demonstrated that in natural scenes containing frontal shots of people, faces were fixated on within the first few fixations, whether subjects had to grade an image on interest value or search it for a specific possibly non-face target. [sent-188, score-0.537]

84 This powerful trend motivated the introduction of a new saliency 6 ** *** * 2 14 70 4 15 60 34 22 60 0 0 1 60 0 0 0. [sent-189, score-0.481]

85 Scatterplots depict the area under ROC curves (AUC) for the 176 images in which VJ found a face. [sent-210, score-0.188]

86 Points above the diagonal indicate better prediction of the model including face detection compared to the models without face channel. [sent-212, score-0.746]

87 Blue markers denote images with faces; red markers images without faces (i. [sent-213, score-0.649]

88 In attempting to predict the fixations of human subjects, we found that this additional face channel improved the performance of both a standard and a more recent graph-based saliency model (almost all blue points in Fig. [sent-220, score-0.879]

89 In the few images without faces, we found that the false positives represented in the face-detection channel did not significantly alter the performance of the saliency maps – although in a preliminary follow-up on a larger image pool we found that they boost mean performance. [sent-222, score-0.878]

90 Together, these findings point towards a specialized “face channel” in our vision system, which is subject to current debate in the attention literature [11, 12, 32]. [sent-223, score-0.15]

91 This suggests that faces always attract attention and gaze, relatively independent of the task. [sent-225, score-0.316]

92 They should therefore be considered as part of the bottom-up saliency pathway. [sent-226, score-0.456]

93 A model of saliency-based visual attention for rapid scene analysis. [sent-261, score-0.14]

94 The role of top-down and bottom-up processes in guiding eye movements during visual search. [sent-284, score-0.149]

95 Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. [sent-297, score-0.168]

96 Does luminance-contrast contribute to a saliency map for overt visual attena o tion? [sent-302, score-0.59]

97 Statistical method for 3 D object detection applied to faces and cars. [sent-352, score-0.38]

98 Rapid object detection using a boosted cascade of simple features. [sent-370, score-0.138]

99 Interactions of visual attention and object recognition: computational modeling, algorithms, and psychophysics. [sent-374, score-0.133]

100 With a careful look: Still no low-level confound to face pop-out Authors’ reply. [sent-440, score-0.328]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('saliency', 0.456), ('gbsm', 0.349), ('face', 0.328), ('xations', 0.289), ('faces', 0.265), ('sm', 0.257), ('vj', 0.24), ('xation', 0.178), ('images', 0.16), ('xated', 0.154), ('subjects', 0.127), ('scanpaths', 0.098), ('detection', 0.09), ('map', 0.077), ('res', 0.075), ('image', 0.074), ('detector', 0.073), ('conspicuity', 0.07), ('channel', 0.07), ('eye', 0.062), ('viewing', 0.061), ('visual', 0.057), ('rois', 0.056), ('xate', 0.056), ('maps', 0.053), ('vision', 0.052), ('gaze', 0.052), ('itti', 0.052), ('attention', 0.051), ('scanpath', 0.049), ('trials', 0.047), ('roc', 0.045), ('auc', 0.045), ('viola', 0.044), ('attentional', 0.044), ('probe', 0.044), ('intensity', 0.043), ('color', 0.042), ('orientation', 0.042), ('eyelink', 0.042), ('false', 0.04), ('scenes', 0.04), ('koch', 0.039), ('facial', 0.036), ('jones', 0.034), ('vis', 0.033), ('harel', 0.033), ('scene', 0.032), ('observers', 0.032), ('pixels', 0.032), ('pasadena', 0.031), ('search', 0.031), ('movements', 0.03), ('instructed', 0.029), ('social', 0.029), ('allocation', 0.029), ('cerf', 0.028), ('einh', 0.028), ('hershler', 0.028), ('psychophysics', 0.028), ('depict', 0.028), ('phase', 0.028), ('technology', 0.027), ('panels', 0.027), ('subject', 0.027), ('objects', 0.026), ('peters', 0.026), ('trend', 0.025), ('early', 0.025), ('fraction', 0.025), ('human', 0.025), ('positives', 0.025), ('object', 0.025), ('tunes', 0.024), ('skin', 0.024), ('alarms', 0.024), ('conspicuous', 0.024), ('maxnorm', 0.024), ('moran', 0.024), ('red', 0.024), ('people', 0.023), ('numerous', 0.023), ('cascade', 0.023), ('castelhano', 0.022), ('navalpakkam', 0.022), ('combined', 0.022), ('patterns', 0.021), ('recognition', 0.02), ('institute', 0.02), ('contained', 0.02), ('debate', 0.02), ('markers', 0.02), ('screen', 0.02), ('look', 0.02), ('california', 0.02), ('ms', 0.019), ('chance', 0.019), ('understanding', 0.019), ('toolbox', 0.019), ('rectangles', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999863 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch

Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1

2 0.47753453 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency

Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1

3 0.1578909 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control

Author: David Baldwin, Michael C. Mozer

Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1

4 0.14351042 109 nips-2007-Kernels on Attributed Pointsets with Applications

Author: Mehul Parsana, Sourangshu Bhattacharya, Chiru Bhattacharya, K. Ramakrishnan

Abstract: This paper introduces kernels on attributed pointsets, which are sets of vectors embedded in an euclidean space. The embedding gives the notion of neighborhood, which is used to define positive semidefinite kernels on pointsets. Two novel kernels on neighborhoods are proposed, one evaluating the attribute similarity and the other evaluating shape similarity. Shape similarity function is motivated from spectral graph matching techniques. The kernels are tested on three real life applications: face recognition, photo album tagging, and shot annotation in video sequences, with encouraging results. 1

5 0.13455682 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

Author: Cha Zhang, Paul A. Viola

Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1

6 0.10966396 143 nips-2007-Object Recognition by Scene Alignment

7 0.10039869 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

8 0.09410838 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

9 0.083443619 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

10 0.080249064 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

11 0.070634812 115 nips-2007-Learning the 2-D Topology of Images

12 0.067244165 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

13 0.066908158 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

14 0.065839559 183 nips-2007-Spatial Latent Dirichlet Allocation

15 0.060127176 125 nips-2007-Markov Chain Monte Carlo with People

16 0.059633508 3 nips-2007-A Bayesian Model of Conditioned Perception

17 0.056441788 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

18 0.056190122 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

19 0.053804737 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

20 0.053127751 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.14), (1, 0.097), (2, 0.009), (3, -0.087), (4, 0.047), (5, 0.309), (6, 0.068), (7, 0.31), (8, -0.004), (9, -0.109), (10, -0.132), (11, -0.078), (12, -0.134), (13, 0.16), (14, -0.365), (15, -0.034), (16, 0.083), (17, 0.176), (18, -0.09), (19, -0.013), (20, -0.02), (21, -0.238), (22, -0.108), (23, -0.055), (24, 0.185), (25, 0.04), (26, 0.016), (27, 0.057), (28, -0.083), (29, -0.023), (30, 0.006), (31, -0.0), (32, 0.008), (33, -0.031), (34, -0.062), (35, -0.037), (36, 0.05), (37, 0.015), (38, -0.039), (39, 0.009), (40, -0.007), (41, 0.008), (42, -0.002), (43, -0.015), (44, -0.001), (45, -0.028), (46, 0.029), (47, -0.001), (48, -0.016), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96348143 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch

Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1

2 0.95668876 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency

Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1

3 0.77651322 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control

Author: David Baldwin, Michael C. Mozer

Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1

4 0.41143382 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

Author: Robert Peters, Laurent Itti

Abstract: Current computational models of bottom-up and top-down components of attention are predictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, and looking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibit reliable temporal signatures during critical event windows in the task sequence—for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically, we find that an event detector based on fused behavioral and stimulus information (in the form of the model’s predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone (model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings. 1

5 0.39226133 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

Author: Michael Ross, Andrew Cohen

Abstract: This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects’ performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it represents classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than previous methods and provides a foundation for further exploration. 1

6 0.33614084 109 nips-2007-Kernels on Attributed Pointsets with Applications

7 0.32406551 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

8 0.28215426 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

9 0.26415846 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

10 0.26370612 143 nips-2007-Object Recognition by Scene Alignment

11 0.25941178 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

12 0.2448193 113 nips-2007-Learning Visual Attributes

13 0.22352025 3 nips-2007-A Bayesian Model of Conditioned Perception

14 0.22287799 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

15 0.22118515 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data

16 0.20132233 115 nips-2007-Learning the 2-D Topology of Images

17 0.19303684 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

18 0.1880458 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

19 0.18312247 193 nips-2007-The Distribution Family of Similarity Distances

20 0.18095198 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.014), (5, 0.039), (13, 0.03), (16, 0.02), (18, 0.041), (19, 0.026), (21, 0.052), (26, 0.021), (35, 0.028), (47, 0.067), (49, 0.011), (76, 0.011), (82, 0.284), (83, 0.145), (85, 0.019), (87, 0.03), (90, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80190653 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch

Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1

2 0.69097531 158 nips-2007-Probabilistic Matrix Factorization

Author: Andriy Mnih, Ruslan Salakhutdinov

Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.

3 0.67544532 84 nips-2007-Expectation Maximization and Posterior Constraints

Author: Kuzman Ganchev, Ben Taskar, João Gama

Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1

4 0.55259293 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency

Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1

5 0.549905 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1

6 0.54939044 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

7 0.54801631 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

8 0.54594374 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

9 0.54516625 125 nips-2007-Markov Chain Monte Carlo with People

10 0.54500866 156 nips-2007-Predictive Matrix-Variate t Models

11 0.54496408 46 nips-2007-Cluster Stability for Finite Samples

12 0.543706 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

13 0.54212928 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

14 0.54132301 43 nips-2007-Catching Change-points with Lasso

15 0.54038656 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

16 0.54038173 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

17 0.54024565 47 nips-2007-Collapsed Variational Inference for HDP

18 0.53978956 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

19 0.53907317 49 nips-2007-Colored Maximum Variance Unfolding

20 0.53896481 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data