nips nips2007 nips2007-202 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos
Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The discriminant center-surround hypothesis for bottom-up saliency Dashan Gao Vijay Mahadevan Nuno Vasconcelos Department of Electrical and Computer Engineering University of California, San Diego {dgao, vmahadev, nuno}@ucsd. [sent-1, score-1.246]
2 edu Abstract The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. [sent-2, score-1.865]
3 The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. [sent-3, score-1.286]
4 This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. [sent-4, score-1.341]
5 It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. [sent-5, score-1.99]
6 Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. [sent-6, score-2.149]
7 In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. [sent-8, score-1.317]
8 1 Introduction The psychophysics of visual saliency and attention have been extensively studied during the last decades. [sent-9, score-0.998]
9 As a result of these studies, it is now well known that saliency mechanisms exist for a number of classes of visual stimuli, including color, orientation, depth, and motion, among others. [sent-10, score-0.942]
10 One approach that has become quite popular, both in the biological and computer vision communities, is to equate saliency with center-surround differencing. [sent-12, score-0.94]
11 It was initially proposed in [12], and has since been applied to saliency detection in both static imagery and motion analysis, as well as to computer vision problems such as robotics, or video compression. [sent-13, score-1.116]
12 For example, it implies that visual perception relies on a linear measure of similarity (difference between feature responses in center and surround). [sent-16, score-0.109]
13 Second, the psychophysics of saliency offers strong evidence for the existence of both non-linearities and asymmetries which are not easily reconciled with this model. [sent-18, score-0.991]
14 Third, although the center-surround hypothesis intrinsically poses 1 saliency as a classification problem (of distinguishing center from surround), there is little basis on which to justify difference-based measures as optimal in a classification sense. [sent-19, score-0.987]
15 An alternative hypothesis is that all saliency decisions are optimal in a decision-theoretic sense. [sent-21, score-0.957]
16 This hypothesis has been denoted as discriminant saliency in [6], where it was somewhat narrowly proposed as the justification for a top-down saliency algorithm. [sent-22, score-2.141]
17 This has motivated us to test its ability to explain the psychophysics of human saliency, which is better documented for the bottom-up neural pathway. [sent-24, score-0.139]
18 We start from the combined hypothesis that 1) bottom-up saliency is based on centersurround processing, and 2) this processing is optimal in a decision theoretic sense. [sent-25, score-0.97]
19 In particular, it is hypothesized that, in the absence of high-level goals, the most salient locations of the visual field are those that enable the discrimination between center and surround with smallest expected probability of error. [sent-26, score-0.156]
20 This is referred to as the discriminant center-surround hypothesis and, by definition, produces saliency measures that are optimal in a classification sense. [sent-27, score-1.258]
21 In this work, we present the results of an experimental evaluation of the plausibility of the discriminant center-surround hypothesis. [sent-29, score-0.321]
22 Our study evaluates the ability of saliency algorithms, that are optimal under this hypothesis, to both • reproduce subject behavior in classical psychophysics experiments, and • solve saliency problems of practical significance, with respect to a number of classes of visual stimuli. [sent-30, score-1.917]
23 We derive decision-theoretic optimal center-surround algorithms for a number of saliency problems, ranging from static spatial saliency, to motion-based saliency in the presence of egomotion or even complex dynamic backgrounds. [sent-31, score-1.953]
24 With respect to practical saliency algorithms, they show that discriminant saliency not only is more accurate than difference-based methods in predicting human eye fixations, but actually produces background subtraction algorithms that outperform the state-of-the-art in computer vision. [sent-33, score-2.268]
25 2 Discriminant center-surround saliency A common hypothesis for bottom-up saliency is that the saliency of each location is determined by how distinct the stimulus at the location is from the stimuli in its surround (e. [sent-35, score-2.896]
26 This hypothesis is inspired by the ubiquity of “center-surround” mechanisms in the early stages of biological vision [10]. [sent-38, score-0.11]
27 The observed feature vector at any location j is denoted by x(j) = (x1 (j), . [sent-44, score-0.06]
28 The saliency of location l, S(l), is quantified by the mutual information between features, X, and class label, Y , S(l) = Il (X; Y ) = pX(l),Y (l) (x, c) log c pX(l),Y (l) (x, c) dx. [sent-50, score-0.943]
29 The function S(l) is referred to as the saliency map. [sent-52, score-0.895]
30 3 Discriminant saliency detection in static imagery Since human saliency has been most thoroughly studied in the domain of static stimuli, we first derive the optimal solution for discriminant saliency in this domain. [sent-53, score-3.185]
31 We then study the ability of the discriminant center-surround saliency hypothesis to explain the fundamental properties of the psychophysics of pre-attentive vision. [sent-54, score-1.329]
32 1 Feature decomposition The building blocks of the static discriminant saliency detector are shown in Figure 1. [sent-56, score-1.305]
33 The first stage, feature decomposition, follows the proposal of [11], which closely mimics the earliest stages of biological visual processing. [sent-57, score-0.076]
34 The four color channels are, in turn, combined ˜ ˜ b into two color opponent channels, R − G for red/green and B − Y for blue/yellow opponency. [sent-59, score-0.078]
35 The feature space X consists of these channels, plus a Gabor decomposition of the intensity map, implemented with a dictionary of zero-mean Gabor filters at 3 spatial scales (centered at frequencies of 0. [sent-64, score-0.078]
36 The consistency of these feature dependencies suggests that they are, in general, not greatly informative about the image class [21, 2] and, in the particular case of saliency, about whether the observed feature vectors originate in the center or surround. [sent-73, score-0.132]
37 In this case, all computations have a simple closed form [4] and can be mapped into a neural network that replicates the standard architecture of V1: a cascade of linear filtering, divisive normalization, quadratic nonlinearity and spatial pooling [7]. [sent-79, score-0.068]
38 3 (a) Color (R/G, B/Y) Feature maps Feature saliency maps 3 1. [sent-81, score-0.991]
39 7 0 10 20 30 40 50 60 70 80 90 Orientation contrast (deg) (c) Figure 2: The nonlinearity of human saliency responses to orientation contrast [14] (a) is replicated by discriminant saliency (b), but not by the model of [11] (c). [sent-89, score-2.229]
40 In [7], we have shown that discriminant saliency reproduces the anecdotal properties of saliency - percept of pop-out for single feature search, disregard of feature conjunctions, and search asymmetries for feature presence vs. [sent-93, score-2.237]
41 absence - that have previously been shown possible to replicate with linear saliency models [11]. [sent-94, score-0.912]
42 Here, we focus on quantitative predictions of human performance, and compare the output of discriminant saliency with both human data and that of the differencebased center-surround saliency model [11]3 . [sent-95, score-2.226]
43 The first experiment tests the ability of the saliency models to predict a well known nonlinearity of human saliency. [sent-96, score-0.981]
44 Nothdurft [14] has characterized the saliency of pop-out targets due to orientation contrast, by comparing the conspicuousness of orientation defined targets and luminance defined ones, and using luminance as a reference for relative target salience. [sent-97, score-1.034]
45 He showed that the saliency of a target increases with orientation contrast, but in a non-linear manner: 1) there exists a threshold below which the effect of pop-out vanishes, and 2) above this threshold saliency increases with contrast, saturating after some point. [sent-98, score-1.836]
46 The results of this experiment are illustrated in Figure 2, which presents plots of saliency strength vs orientation contrast for human subjects [14] (in (a)), for discriminant saliency (in (b)), and for the difference-based model of [11]. [sent-99, score-2.208]
47 Note that discriminant saliency closely predicts the strong threshold and saturation effects characteristic of subject performance, but the difference-based model shows no such compliance. [sent-100, score-1.196]
48 It replicates the experiment designed by Treisman [19] to show that the asymmetries of human saliency comply with Weber’s law. [sent-102, score-0.997]
49 Figure 3 (b) shows a scatter plot of the values of discriminant saliency obtained across the set of displays. [sent-104, score-1.208]
50 For comparison, Figure 3 (c) presents the corresponding scatter plot for the model of [11], which clearly does not replicate human performance. [sent-106, score-0.085]
51 4 Applications of discriminant saliency We have, so far, presented quantitative evidence in support of the hypothesis that pre-attentive vision implements decision-theoretical center-surround saliency. [sent-107, score-1.299]
52 Informal experimentation has shown that the saliency results are not substantively affected by variations around the parameter values adopted. [sent-109, score-0.895]
53 8 (b) (c) Figure 3: An example display (a) and performance of saliency detectors (discriminant saliency (b) and [11] (c)) on Weber’s law experiment. [sent-138, score-1.838]
54 Saliency model ROC area discriminant saliency Itti et al. [sent-139, score-1.217]
55 98 Figure 4: Average ROC area, as a function of inter-subject ROC area, for the saliency algorithms. [sent-147, score-0.895]
56 7547 Table 1: ROC areas for different saliency models with respect to all human fixations. [sent-152, score-0.964]
57 already mentioned one-to-one mapping between the discriminant saliency detector proposed above and the standard model for the neurophysiology of V1 [7]. [sent-153, score-1.246]
58 Another interesting property of discriminant saliency is that its optimality is independent of the stimulus dimension under consideration, or of specific feature sets. [sent-154, score-1.225]
59 In fact, (1) can be applied to any type of stimuli, and any type of features, as long as it is possible to estimate the required probability distributions from the center and surround neighborhoods. [sent-155, score-0.107]
60 This encouraged us to derive discriminant saliency detectors for various computer vision applications, ranging from the prediction of human eye fixations, to the detection of salient moving objects, to background subtraction in the context of highly dynamic scenes. [sent-156, score-1.539]
61 The outputs of these discriminant saliency detectors are next compared with either human performance, or the state-of-the-art in computer vision for each application. [sent-157, score-1.317]
62 1 Prediction of eye fixations on natural images We start by using the static discriminant saliency detector of the previous section to predict human eye fixations. [sent-159, score-1.419]
63 For this, the saliency maps were compared to the eye fixations of human subjects in an image viewing task. [sent-160, score-1.08]
64 Under this protocol, all saliency maps are first quantized into a binary mask that classifies each image location as either a fixation or non-fixation [17]. [sent-162, score-1.002]
65 The predictions of discriminant saliency are compared to those of the methods of [11] and [1]. [sent-166, score-1.196]
66 It is clear that discriminant saliency achieves the best performance among the three detectors. [sent-168, score-1.196]
67 Again, discriminant saliency exhibits the strongest correlation with human performance, this happens at all levels of inter-subject consistency, and the difference is largest when the latter is strong. [sent-170, score-1.252]
68 In this region, the performance of discriminant saliency (. [sent-171, score-1.196]
69 2 Discriminant saliency on motion fields Similarly to the static case, center-surround discriminant saliency can produce motion-based saliency maps if combined with motion features. [sent-176, score-3.224]
70 We have implemented a simple motion-based detector by computing a dense motion vector map (optical flow) between pairs of consecutive images, and using the magnitude of the motion vector at each location as motion feature. [sent-177, score-0.285]
71 The probability distributions of this feature, within center and surround, were estimated with histograms, and the motion saliency maps computed with (2). [sent-178, score-1.041]
72 5 Figure 5: Optical flow-based saliency in the presence of egomotion. [sent-179, score-0.91]
73 Despite the simplicity of our motion representation, the discriminant saliency detector exhibits interesting performance. [sent-180, score-1.314]
74 Figure 5 shows several frames (top row) from a video sequence, and their discriminant motion saliency maps (bottom row). [sent-181, score-1.345]
75 This results in significant variability of the background, due to egomotion, making the detection of foreground motion (leopard), a non-trivial task. [sent-183, score-0.135]
76 As shown in the saliency maps, discriminant saliency successfully disregards the egomotion component of the optical flow, detecting the leopard as most salient. [sent-184, score-2.178]
77 3 Discriminant Saliency with dynamic background While the results of Figure 5 are probably within the reach of previously proposed saliency models, they illustrate the flexibility of discriminant saliency. [sent-186, score-1.282]
78 In this section we move to a domain where traditional saliency algorithms almost invariably fail. [sent-187, score-0.895]
79 This consists of videos of scenes with complex and dynamic backgrounds (e. [sent-188, score-0.097]
80 In order to capture the motion patterns characteristic of these backgrounds it is necessary to rely on reasonably sophisticated probabilistic models, such as the dynamic texture model [5]. [sent-191, score-0.172]
81 difference-based, saliency frameworks but naturally compatible with the discriminant saliency hypothesis. [sent-194, score-2.091]
82 We next combine discriminant center-surround saliency with the dynamic texture model, to produce a background-subtraction algorithm for scenes with complex background dynamics. [sent-195, score-1.337]
83 While background subtraction is a classic problem in computer vision, there has been relatively little progress for these type of scenes (e. [sent-196, score-0.101]
84 A dynamic texture (DT) [5, 3] is an autoregressive, generative model for video. [sent-199, score-0.079]
85 Given a sequence of images, the parameters of the dynamic texture can be learned for the center and surround regions at each image location, enabling a probabilistic description of the video, with which the mutual information of (2) can be evaluated. [sent-205, score-0.231]
86 We applied the dynamic texture-based discriminant saliency (DTDS) detector to three video sequences containing objects moving in water. [sent-206, score-1.349]
87 These sequences are more challenging, since the micro-texture of the water surface is superimposed on a lower frequency sweeping wave (Surfer) and interspersed with high frequency components due to turbulent wakes (created by the boat, surfer, and crest of the sweeping wave). [sent-209, score-0.12]
88 Figures 7(b), 8(b) and 9(b), show the saliency maps produced by discriminant saliency for the three sequences. [sent-210, score-2.139]
89 The DTDS detector performs surprisingly well, in all cases, at detecting the foreground objects while ignoring the movements of the background. [sent-211, score-0.103]
90 8 1 False positive rate (FPR) (c) Figure 6: Performance of background subtraction algorithms on: (a) Water-Bottle, (b) Boat, and (c) Surfer. [sent-237, score-0.083]
91 (a) (b) (c) Figure 7: Results on Bottle: (a) original; b) discriminant saliency with DT; and c) GMM model of [16, 24]. [sent-238, score-1.196]
92 For comparison, we present the output of a state-of-the-art background subtraction algorithm, a Gaussian mixture model (GMM) [16, 24]. [sent-239, score-0.083]
93 As can be seen in Figures 7(c), 8(c) and 9(c), the resulting foreground detection is very noisy, and cannot adapt to the highly dynamic nature of the water surface. [sent-240, score-0.137]
94 Note, in particular, that the waves produced by boat and surfer, as well as the sweeping wave crest, create serious difficulties for this algorithm. [sent-241, score-0.099]
95 Unlike the saliency maps of DTDS, the resulting foreground maps would be difficult to analyze by subsequent vision (e. [sent-242, score-1.062]
96 To produce a quantitative comparison of the saliency maps, these were thresholded at a large range of values. [sent-245, score-0.918]
97 Modeling, clustering, and segmenting video with mixtures of dynamic textures. [sent-265, score-0.075]
98 7 (a) (b) (c) Figure 8: Results on Boats: (a) original; b) discriminant saliency with DT; and c) GMM model of [16, 24]. [sent-321, score-1.196]
99 (a) (b) (c) Figure 9: Results on Surfer: (a) original; b) discriminant saliency with DT; and c) GMM model of [16, 24]. [sent-322, score-1.196]
100 Segmenting foreground objects from a dynamic textured background via a robust Kalman filter. [sent-398, score-0.139]
wordName wordTfidf (topN-words)
[('saliency', 0.895), ('discriminant', 0.301), ('surround', 0.077), ('psychophysics', 0.071), ('motion', 0.068), ('gmm', 0.067), ('xations', 0.061), ('roc', 0.058), ('human', 0.056), ('detector', 0.05), ('hypothesis', 0.05), ('dtds', 0.048), ('surfer', 0.048), ('maps', 0.048), ('orientation', 0.046), ('background', 0.044), ('dynamic', 0.042), ('static', 0.041), ('foreground', 0.041), ('egomotion', 0.039), ('subtraction', 0.039), ('eye', 0.038), ('texture', 0.037), ('detectors', 0.035), ('boat', 0.034), ('video', 0.033), ('visual', 0.032), ('location', 0.031), ('vision', 0.03), ('center', 0.03), ('feature', 0.029), ('fpr', 0.029), ('leopard', 0.029), ('treisman', 0.029), ('itti', 0.029), ('image', 0.028), ('water', 0.028), ('detection', 0.026), ('gao', 0.025), ('bruce', 0.025), ('weber', 0.025), ('backgrounds', 0.025), ('sweeping', 0.025), ('asymmetries', 0.025), ('xation', 0.025), ('wavelet', 0.024), ('pami', 0.024), ('quantitative', 0.023), ('wave', 0.023), ('imagery', 0.023), ('color', 0.022), ('stimuli', 0.022), ('replicates', 0.021), ('px', 0.021), ('area', 0.021), ('channels', 0.021), ('plausibility', 0.02), ('anecdotal', 0.019), ('bottle', 0.019), ('conspicuousness', 0.019), ('crest', 0.019), ('equates', 0.019), ('salliency', 0.019), ('optical', 0.019), ('dr', 0.019), ('gabor', 0.018), ('decomposition', 0.018), ('scenes', 0.018), ('nonlinearity', 0.018), ('responses', 0.018), ('replicate', 0.017), ('intensity', 0.017), ('nuno', 0.017), ('waves', 0.017), ('mutual', 0.017), ('salient', 0.017), ('consistency', 0.016), ('moving', 0.016), ('mechanisms', 0.015), ('presence', 0.015), ('ow', 0.015), ('biological', 0.015), ('subjects', 0.015), ('architecture', 0.015), ('dt', 0.015), ('deg', 0.014), ('luminance', 0.014), ('cvpr', 0.014), ('spatial', 0.014), ('koch', 0.014), ('areas', 0.013), ('combined', 0.013), ('display', 0.013), ('densities', 0.013), ('il', 0.012), ('scatter', 0.012), ('ability', 0.012), ('optimal', 0.012), ('objects', 0.012), ('videos', 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency
Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos
Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1
2 0.47753453 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch
Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1
3 0.24811758 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control
Author: David Baldwin, Michael C. Mozer
Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1
4 0.069239132 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
Author: Christoforos Christoforou, Paul Sajda, Lucas C. Parra
Abstract: Traditional analysis methods for single-trial classification of electroencephalography (EEG) focus on two types of paradigms: phase locked methods, in which the amplitude of the signal is used as the feature for classification, e.g. event related potentials; and second order methods, in which the feature of interest is the power of the signal, e.g. event related (de)synchronization. The procedure for deciding which paradigm to use is ad hoc and is typically driven by knowledge of the underlying neurophysiology. Here we propose a principled method, based on a bilinear model, in which the algorithm simultaneously learns the best first and second order spatial and temporal features for classification of EEG. The method is demonstrated on simulated data as well as on EEG taken from a benchmark data used to test classification algorithms for brain computer interfaces. 1 1.1
5 0.061545983 3 nips-2007-A Bayesian Model of Conditioned Perception
Author: Alan Stocker, Eero P. Simoncelli
Abstract: unkown-abstract
6 0.056534663 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
7 0.056175619 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
8 0.048071403 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
9 0.042875648 143 nips-2007-Object Recognition by Scene Alignment
10 0.041609734 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
11 0.0359983 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
12 0.034086846 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
13 0.033064231 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
14 0.03041946 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection
15 0.030153999 113 nips-2007-Learning Visual Attributes
16 0.030094361 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
17 0.028816162 125 nips-2007-Markov Chain Monte Carlo with People
18 0.028714921 182 nips-2007-Sparse deep belief net model for visual area V2
19 0.027549602 106 nips-2007-Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing
20 0.027492201 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
topicId topicWeight
[(0, -0.094), (1, 0.058), (2, 0.03), (3, -0.053), (4, 0.039), (5, 0.259), (6, 0.063), (7, 0.252), (8, -0.036), (9, -0.13), (10, -0.111), (11, -0.099), (12, -0.106), (13, 0.138), (14, -0.351), (15, -0.056), (16, 0.11), (17, 0.188), (18, -0.067), (19, -0.03), (20, -0.042), (21, -0.314), (22, -0.124), (23, -0.098), (24, 0.274), (25, 0.042), (26, 0.035), (27, 0.066), (28, -0.053), (29, -0.045), (30, 0.037), (31, -0.004), (32, 0.003), (33, -0.056), (34, -0.11), (35, -0.108), (36, 0.096), (37, 0.021), (38, 0.005), (39, -0.055), (40, 0.001), (41, -0.018), (42, -0.071), (43, -0.011), (44, -0.014), (45, 0.073), (46, -0.001), (47, 0.029), (48, -0.007), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.98068982 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency
Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos
Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1
2 0.84933054 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch
Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1
3 0.81470311 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control
Author: David Baldwin, Michael C. Mozer
Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1
4 0.33433467 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
Author: Robert Peters, Laurent Itti
Abstract: Current computational models of bottom-up and top-down components of attention are predictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, and looking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibit reliable temporal signatures during critical event windows in the task sequence—for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically, we find that an event detector based on fused behavioral and stimulus information (in the form of the model’s predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone (model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings. 1
5 0.2064043 3 nips-2007-A Bayesian Model of Conditioned Perception
Author: Alan Stocker, Eero P. Simoncelli
Abstract: unkown-abstract
6 0.20106192 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
7 0.14781862 109 nips-2007-Kernels on Attributed Pointsets with Applications
8 0.13895968 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
9 0.13551064 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection
10 0.12800917 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
11 0.12472767 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
12 0.11797742 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
13 0.10872368 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
14 0.10735356 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
15 0.10378792 143 nips-2007-Object Recognition by Scene Alignment
16 0.10322717 81 nips-2007-Estimating disparity with confidence from energy neurons
17 0.09964782 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
18 0.098840594 70 nips-2007-Discriminative K-means for Clustering
19 0.094781004 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
20 0.093327574 113 nips-2007-Learning Visual Attributes
topicId topicWeight
[(5, 0.049), (13, 0.025), (16, 0.029), (18, 0.04), (19, 0.016), (21, 0.053), (26, 0.014), (34, 0.018), (35, 0.011), (47, 0.068), (49, 0.014), (53, 0.012), (63, 0.015), (82, 0.018), (83, 0.081), (85, 0.012), (87, 0.029), (90, 0.103), (97, 0.28)]
simIndex simValue paperId paperTitle
1 0.80634224 35 nips-2007-Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms
Author: Dominik Endres, Mike Oram, Johannes Schindelin, Peter Foldiak
Abstract: The peristimulus time histogram (PSTH) and its more continuous cousin, the spike density function (SDF) are staples in the analytic toolkit of neurophysiologists. The former is usually obtained by binning spike trains, whereas the standard method for the latter is smoothing with a Gaussian kernel. Selection of a bin width or a kernel size is often done in an relatively arbitrary fashion, even though there have been recent attempts to remedy this situation [1, 2]. We develop an exact Bayesian, generative model approach to estimating PSTHs and demonstate its superiority to competing methods. Further advantages of our scheme include automatic complexity control and error bars on its predictions. 1
2 0.80625021 133 nips-2007-Modelling motion primitives and their timing in biologically executed movements
Author: Ben Williams, Marc Toussaint, Amos J. Storkey
Abstract: Biological movement is built up of sub-blocks or motion primitives. Such primitives provide a compact representation of movement which is also desirable in robotic control applications. We analyse handwriting data to gain a better understanding of primitives and their timings in biological movements. Inference of the shape and the timing of primitives can be done using a factorial HMM based model, allowing the handwriting to be represented in primitive timing space. This representation provides a distribution of spikes corresponding to the primitive activations, which can also be modelled using HMM architectures. We show how the coupling of the low level primitive model, and the higher level timing model during inference can produce good reconstructions of handwriting, with shared primitives for all characters modelled. This coupled model also captures the variance profile of the dataset which is accounted for by spike timing jitter. The timing code provides a compact representation of the movement while generating a movement without an explicit timing model produces a scribbling style of output. 1
same-paper 3 0.78591478 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency
Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos
Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a number of the latter (the prediction of human eye fixations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1
4 0.49701381 182 nips-2007-Sparse deep belief net model for visual area V2
Author: Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or “deep,” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner” features matches well with the results from the Ito & Komatsu’s study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. 1
5 0.48988542 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control
Author: David Baldwin, Michael C. Mozer
Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1
6 0.47734505 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
7 0.47297484 8 nips-2007-A New View of Automatic Relevance Determination
8 0.47074801 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
9 0.47014618 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
10 0.46577576 184 nips-2007-Stability Bounds for Non-i.i.d. Processes
11 0.46219686 156 nips-2007-Predictive Matrix-Variate t Models
12 0.45998856 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI
13 0.45853522 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
14 0.45837423 47 nips-2007-Collapsed Variational Inference for HDP
15 0.45801044 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
16 0.4577083 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions
17 0.45532337 63 nips-2007-Convex Relaxations of Latent Variable Training
18 0.45405391 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
19 0.45376399 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression
20 0.45340484 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation