nips nips2006 nips2006-8 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract This paper addresses the bottom-up influence of local image information on human eye movements. [sent-6, score-0.492]
2 Most existing computational models use a set of biologically plausible linear filters, e. [sent-7, score-0.208]
3 , Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. [sent-9, score-0.267]
4 Here, we propose to learn a visual saliency model directly from human eye movement data. [sent-13, score-1.173]
5 The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. [sent-14, score-0.157]
6 Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. [sent-15, score-0.316]
7 In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. [sent-16, score-0.671]
8 1 Introduction The human visual system samples images through saccadic eye movements, which rapidly change the point of fixation. [sent-17, score-0.658]
9 , to a system that responds to simple, and often local image features, such as a bright spot in a dark scene. [sent-21, score-0.216]
10 During the past decade, several studies have explored which image features attract eye movements. [sent-22, score-0.452]
11 Parkhurst, Law, and Niebur [13] showed that a saliency map [9], computed by a model similar to the widely used framework by Itti, Koch and Niebur [3, 4], is significantly correlated with human fixation patterns. [sent-25, score-0.721]
12 Each of the above models is built on a particular choice of image features that are believed to be relevant to visual saliency. [sent-27, score-0.411]
13 A common approach is to compute several feature maps from linear filters that are biologically plausible, e. [sent-28, score-0.179]
14 , Difference of Gaussians (DoG) or Gabor filters, and nonlinearly combine the feature maps into a single saliency map [1, 3, 4, 13, 16, 21]. [sent-30, score-0.738]
15 This makes it straightforward to construct complex models from simple, biologically plausible components. [sent-31, score-0.158]
16 As a consequence, any such model is biased to certain image structure, and therefore discriminates features that might not seem plausible at first sight, but may well play a significant role. [sent-33, score-0.293]
17 (b) shows the top right image from (a), together with the recorded fixation locations from all 14 subjects. [sent-36, score-0.31]
18 Instead of using a predefined set of feature maps, our saliency model is learned directly from human eye movement data. [sent-44, score-1.112]
19 The model consists of a nonlinear mapping from an image patch to a real value, trained to yield positive outputs on fixated, and negative outputs on randomly selected image patches. [sent-45, score-0.641]
20 The main difference to previous models is that our saliency function is essentially determined by the fact that it maximizes the prediction performance on the observed data. [sent-46, score-0.595]
21 Below, we show that the prediction performance of our model is comparable to that of biologically motivated models. [sent-47, score-0.175]
22 They consist of 200 natural images (1024×768, 8bit grayscale) and 18,065 fixation locations recorded from 14 na¨ve subjects. [sent-50, score-0.231]
23 The subjects freely viewed ı each image for about three seconds on a 19 inch CRT at full screen size and 60cm distance, which corresponds to 37◦ × 27◦ of visual angle. [sent-51, score-0.353]
24 1 Below, we are going to formulate saliency learning as a classification problem. [sent-54, score-0.563]
25 As pointed out in [18, 21], care must be taken that no spurious differences in the local image statistics are generated by using different spatial distributions for positive and negative examples. [sent-58, score-0.197]
26 As an example, fixation locations are usually biased towards the center of the image, probably due to the reduced physical effort when looking straight. [sent-59, score-0.197]
27 At the same time, it is known that local image statistics can be correlated with 1 In our initial study [8], these data were preprocessed further. [sent-60, score-0.154]
28 If we sampled background locations uniformly over the image, our system might learn the difference between pixel statistics at the image center and towards the boundary, instead of the desired difference between fixated and non-fixated locations. [sent-69, score-0.532]
29 To avoid this effect, we use the 18,065 fixation locations to generate an equal number of background locations by using the same image coordinates, but with the corresponding image numbers shuffled. [sent-71, score-0.598]
30 The proposed model computes saliency based on local image structure. [sent-73, score-0.759]
31 To represent fixations and background locations accordingly, we cut out a square image patch at each location and stored the pixel values in a feature vector xi together with a label yi ∈ {1; −1}, indicating fixation or background. [sent-74, score-0.634]
32 Unfortunately, choosing an appropriate patch size and resolution is not straightforward, as there might be a wide range of reasonable values. [sent-75, score-0.224]
33 To remedy this, we follow the approach proposed in [8], which is a simple compromise between computational tractability and generality: we fix the resolution to 13 × 13 pixels, but leave the patch size d unspecified, i. [sent-76, score-0.224]
34 For each image location, 11 patches were extracted, with sizes ranging between d = 0. [sent-80, score-0.336]
35 47◦ and d = 27◦ visual angle, equally spaced on a logarithmic scale. [sent-81, score-0.215]
36 Each patch was subsampled to 13 × 13 pixels, after low-pass filtering to reduce aliasing effects. [sent-82, score-0.173]
37 The range of sizes was chosen such that pixels in the smallest patch correspond to image pixels at full screen resolution, and that the largest patch has full screen height. [sent-83, score-0.75]
38 Finally, for each patch we subtracted the mean intensity, and stored the normalized pixel values in a 169-dimensional feature vector xi . [sent-84, score-0.312]
39 This is necessary, since image patches from different locations can overlap, leading to a severe over-estimation of the generalization performance. [sent-93, score-0.458]
40 3 Model and Learning Method From the eye movement described in Section 2, we learn a bottom-up saliency map f (x) : R169 → R using a support vector machine (SVM) [2]. [sent-94, score-0.94]
41 We model saliency as a linear combination of Gaussian radial basis functions (RBFs), centered at the training points xi , m f (x) = αi yi exp − i=1 x − xi 2σ 2 2 . [sent-95, score-0.774]
42 1, this design parameter, as well as the RBF bandwidth σ and the patch size d is determined by maximizing the model’s estimated prediction performance. [sent-100, score-0.205]
43 Similar to most existing approaches, our model is based on linear filters whose outputs are nonlinearly combined into a real-valued saliency measure. [sent-102, score-0.775]
44 This is a common model for the early visual system, and receptive-field estimation techniques such as reverse-correlation usually make the same assumptions. [sent-103, score-0.237]
45 This implies that the system is has no a priori preference for particular image structures. [sent-109, score-0.216]
46 1 Experiments Selection of d, σ, and λ For fixing d, σ, and λ, we conducted an exhaustive search on a 11 × 9 × 13 grid with the grid points equally spaced on a log scale such that d = 0. [sent-131, score-0.173]
47 In order to make the search computationally tractable, we divided the training set (Section 2) into eight parts. [sent-144, score-0.149]
48 2 Due to the subsampling (Section 2), the optimal patch size of d = 5. [sent-163, score-0.173]
49 4◦ leads to an effective saliency map resolution of 89 × 66 (the original image is 1024 × 768), which corresponds to 2. [sent-164, score-0.804]
50 (a) shows a natural scene from our database, together with the recorded eye movements from all 14 subjects. [sent-168, score-0.432]
51 Itti’s saliency map, using ”standard” normalization is shown in (b). [sent-169, score-0.563]
52 The picture in (c) shows our learned saliency map, which was re-built for this example, with the image in (a) excluded from the training data. [sent-171, score-0.813]
53 For a qualitative comparison, Figure 3 shows our learned saliency map and Itti’s model evaluated on a sample image. [sent-202, score-0.693]
54 The main insight here is that our nonparametric model performs at the same level as existing, biologically motivated models, which implement plausible, multi-scale front-end filters, carefully designed non-linearities, and even global effects. [sent-204, score-0.196]
55 , that it has learned regularities in the data that are relevant to the human fixation selection mechanism. [sent-208, score-0.243]
56 To avoid this, we instead characterize the learned function by means of inputs x that are particularly excitatory or inhibitory to the entire system. [sent-214, score-0.285]
57 As a first test, we collected 20, 000 image patches from random locations in natural scenes (not in the training set) and presented them to our system. [sent-215, score-0.582]
58 The top and bottom 100 patches sorted by model output and a histogram over all 20, 000 saliency values are shown in Figure 4 . [sent-216, score-0.863]
59 Note that since our model is unbiased towards any particular image structure, the different Figure 4: Natural image patches ranked by saliency according to our model. [sent-217, score-1.125]
60 The panels (a) and (b) show the bottom and top 100 of 20, 000 patches, respectively (the dots in between denote the 18, 800 patches which are not shown). [sent-218, score-0.182]
61 A histogram of all 20, 000 saliency values is given on the lower right. [sent-219, score-0.609]
62 (b) frequency (x1000) (a) 6 4 2 0 −2 −1 0 1 saliency 2 patterns observed in high and low output patches are solely due to differences between pixel statistics at fixated and background regions. [sent-225, score-0.894]
63 The high output patches seem to have higher contrast, which is in agreement with previous results, e. [sent-226, score-0.241]
64 Another result from [14, 18] is that in natural images the correlation between pixel values decays faster at fixated locations, than at randomly chosen locations. [sent-234, score-0.189]
65 Figure 4 shows this trend as well: as we move away from the patch center, the pixels’ correlation with the center intensity decays faster for patches with high predicted salience. [sent-235, score-0.485]
66 Moreover, a study on bispectra at fixated image locations [10] suggested that “the saccadic selection system avoids image regions, which are dominated by a single oriented structure. [sent-236, score-0.598]
67 A closer look at Figure 4 reveals that our model tends to attribute saliency not alone to contrast, but also to non-trivial image structure. [sent-238, score-0.759]
68 To further characterize the system, we explicitly computed the maximally excitatory and inhibitory stimuli. [sent-242, score-0.347]
69 The initial x were constructed by drawing 169 pixel values from a normal distribution with zero mean and then normalizing the patch standard deviation to 0. [sent-248, score-0.246]
70 We also re-ran this experiment with natural image patches as starting values, with identical results. [sent-259, score-0.372]
71 This indicates that our saliency function has essentially two minima and two maxima in x. [sent-260, score-0.631]
72 The first two images (a) and (b) show the maximally inhibitory stimuli. [sent-262, score-0.258]
73 5 (a) (b) (c) (d) Figure 5: Maximally inhibitory and excitatory stimuli of the learned model. [sent-267, score-0.409]
74 Note the large magnitude of the saliency values compared to the typical model output (cf. [sent-268, score-0.635]
75 (a) and (b): the two maximally inhibitory stimuli (lowest possible saliency). [sent-270, score-0.343]
76 (c) and (d): the two maximally excitatory stimuli (highest possible saliency), (e) and (f): the radial average of (c) and (d), respectively. [sent-271, score-0.427]
77 On the other hand, the maximally excitatory stimuli, denoted by (c) and (d), have center-surround structure. [sent-280, score-0.242]
78 All four stimuli have zero mean, which is not surprising since during gradient search, both the initial value and the step directions—which are linear combinations of the training data—have zero mean. [sent-281, score-0.168]
79 3 The optimal stimuli thus bear a close resemblance to receptive fields in the early visual system [11]. [sent-286, score-0.446]
80 To see that the optimal stimuli have in fact prototype character, note how the histogram in Figure 4 reflects the typical distribution of natural image patches along the learned saliency function. [sent-287, score-1.157]
81 It illustrates that the saliency values of unseen natural image patches usually lie between −2. [sent-288, score-0.935]
82 In contrast, our optimal stimuli have saliency values of 5. [sent-293, score-0.687]
83 5, indicating that they represent the difference between fixated and background locations in a much more articulated way than any of the noisy measurements in our data set. [sent-295, score-0.168]
84 5 Discussion We have presented a nonparametric model for bottom-up visual saliency, trained on human eye movement data. [sent-296, score-0.663]
85 Nevertheless, we found that the prediction performance of our system is comparable to that of parametric, biologically motivated models. [sent-301, score-0.195]
86 Also, we found that the maximally excitatory stimuli of our system have centersurround structure, similar to DoG filters commonly used in early vision models [3, 13, 21]. [sent-307, score-0.538]
87 This is a nontrivial result, since our model has no preference for any particular image features, i. [sent-308, score-0.196]
88 , a priori, any 13 × 13 image patch is equally likely to be an optimal stimulus. [sent-310, score-0.365]
89 Recently, several authors have explored whether oriented (Gabor) or center-surround (DoG) features are more relevant to human eye movements. [sent-311, score-0.411]
90 3 Please note that the radial average curves in Figure 5 (e) and (f) do not necessarily sum to zero, since the patch area in (c) and (d) grows quadratically with its corresponding radius. [sent-314, score-0.234]
91 Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. [sent-330, score-0.945]
92 Quantitative modeling of perceptual salience at human eye position. [sent-334, score-0.386]
93 A model of saliency-based visual attention for rapid scene analysis. [sent-340, score-0.249]
94 A saliency-based search mechanism for overt and covert shifts of visual attention. [sent-345, score-0.222]
95 Learning an interest operao tor from human eye movements. [sent-361, score-0.338]
96 Shifts in selective visual attention: towards the underlying neural circuitry. [sent-366, score-0.177]
97 The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. [sent-388, score-0.352]
98 Modeling the role of salience in the allocation of overt visual attention. [sent-395, score-0.269]
99 Algorithms for defining visual regions-of-interest: Comparison with eye fixations. [sent-416, score-0.405]
100 Contrast statistics for foveated visual systems: Fixation selection by minimizing contrast entropy. [sent-426, score-0.231]
wordName wordTfidf (topN-words)
[('saliency', 0.563), ('eye', 0.258), ('xated', 0.192), ('patches', 0.182), ('patch', 0.173), ('itti', 0.163), ('image', 0.154), ('visual', 0.147), ('xations', 0.131), ('excitatory', 0.128), ('xation', 0.128), ('stimuli', 0.124), ('locations', 0.122), ('maximally', 0.114), ('lters', 0.109), ('inhibitory', 0.105), ('biologically', 0.101), ('movement', 0.083), ('sem', 0.082), ('human', 0.08), ('koch', 0.077), ('pixel', 0.073), ('pixels', 0.073), ('parkhurst', 0.072), ('saccadic', 0.072), ('dog', 0.066), ('vision', 0.062), ('system', 0.062), ('nonlinearly', 0.061), ('radial', 0.061), ('scene', 0.06), ('outputs', 0.059), ('plausible', 0.057), ('kienzle', 0.055), ('plausibility', 0.055), ('reinagel', 0.055), ('nonparametric', 0.053), ('screen', 0.052), ('learned', 0.052), ('resolution', 0.051), ('existing', 0.05), ('contrast', 0.05), ('krieger', 0.048), ('salience', 0.048), ('wichmann', 0.048), ('early', 0.048), ('gabor', 0.047), ('background', 0.046), ('histogram', 0.046), ('center', 0.045), ('intensity', 0.044), ('fold', 0.044), ('movements', 0.044), ('scenes', 0.044), ('nonlinearities', 0.044), ('overt', 0.044), ('regularities', 0.044), ('svm', 0.044), ('maps', 0.044), ('training', 0.044), ('spatial', 0.043), ('model', 0.042), ('decays', 0.041), ('niebur', 0.041), ('simplistic', 0.041), ('eight', 0.041), ('features', 0.04), ('images', 0.039), ('gaze', 0.038), ('equally', 0.038), ('grid', 0.037), ('believed', 0.037), ('natural', 0.036), ('map', 0.036), ('resemblance', 0.035), ('feature', 0.034), ('minima', 0.034), ('recorded', 0.034), ('selection', 0.034), ('cognition', 0.034), ('maxima', 0.034), ('please', 0.034), ('yielded', 0.034), ('justi', 0.033), ('relevant', 0.033), ('divided', 0.033), ('xi', 0.032), ('ndings', 0.032), ('prediction', 0.032), ('scores', 0.032), ('search', 0.031), ('interactions', 0.031), ('allocation', 0.03), ('receptive', 0.03), ('spaced', 0.03), ('output', 0.03), ('biological', 0.03), ('towards', 0.03), ('agreement', 0.029), ('axes', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
2 0.50377297 86 nips-2006-Graph-Based Visual Saliency
Author: Jonathan Harel, Christof Koch, Pietro Perona
Abstract: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%. 1
3 0.12902415 120 nips-2006-Learning to Traverse Image Manifolds
Author: Piotr DollĂĄr, Vincent Rabaud, Serge J. Belongie
Abstract: We present a new algorithm, Locally Smooth Manifold Learning (LSML), that learns a warping function from a point on an manifold to its neighbors. Important characteristics of LSML include the ability to recover the structure of the manifold in sparsely populated regions and beyond the support of the provided data. Applications of our proposed technique include embedding with a natural out-of-sample extension and tasks such as tangent distance estimation, frame rate up-conversion, video compression and motion transfer. 1
4 0.12871331 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
5 0.1257474 9 nips-2006-A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
Author: Daniel J. Navarro, Thomas L. Griffiths
Abstract: The additive clustering model is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function of common features. This paper develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides a straightforward way to obtain estimates of both the number of features used in producing similarity judgments and their importance. 1
6 0.12489806 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
7 0.11483806 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
8 0.10431632 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
9 0.10243677 167 nips-2006-Recursive ICA
10 0.097021967 18 nips-2006-A selective attention multi--chip system with dynamic synapses and spiking neurons
11 0.096528798 31 nips-2006-Analysis of Contour Motions
12 0.093917407 42 nips-2006-Bayesian Image Super-resolution, Continued
13 0.092534877 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
14 0.091108024 66 nips-2006-Detecting Humans via Their Pose
15 0.086471923 50 nips-2006-Chained Boosting
16 0.082944289 16 nips-2006-A Theory of Retinal Population Coding
17 0.082584068 192 nips-2006-Theory and Dynamics of Perceptual Bistability
18 0.082118243 122 nips-2006-Learning to parse images of articulated bodies
19 0.078856915 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
20 0.077425204 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
topicId topicWeight
[(0, -0.255), (1, -0.051), (2, 0.217), (3, -0.085), (4, -0.007), (5, -0.147), (6, -0.21), (7, -0.07), (8, -0.003), (9, -0.013), (10, 0.301), (11, -0.006), (12, 0.014), (13, -0.19), (14, -0.174), (15, 0.104), (16, 0.111), (17, 0.001), (18, -0.106), (19, -0.079), (20, 0.297), (21, -0.099), (22, -0.246), (23, 0.184), (24, -0.184), (25, -0.072), (26, -0.044), (27, 0.007), (28, -0.085), (29, 0.128), (30, 0.084), (31, 0.011), (32, -0.098), (33, -0.002), (34, -0.027), (35, 0.049), (36, 0.067), (37, -0.052), (38, 0.079), (39, -0.03), (40, 0.026), (41, 0.021), (42, 0.023), (43, -0.03), (44, 0.04), (45, -0.042), (46, -0.028), (47, 0.01), (48, -0.025), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.93871588 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
2 0.92329204 86 nips-2006-Graph-Based Visual Saliency
Author: Jonathan Harel, Christof Koch, Pietro Perona
Abstract: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%. 1
3 0.46417639 52 nips-2006-Clustering appearance and shape by learning jigsaws
Author: Anitha Kannan, John Winn, Carsten Rother
Abstract: Patch-based appearance models are used in a wide range of computer vision applications. To learn such models it has previously been necessary to specify a suitable set of patch sizes and shapes by hand. In the jigsaw model presented here, the shape, size and appearance of patches are learned automatically from the repeated structures in a set of training images. By learning such irregularly shaped ‘jigsaw pieces’, we are able to discover both the shape and the appearance of object parts without supervision. When applied to face images, for example, the learned jigsaw pieces are surprisingly strongly associated with face parts of different shapes and scales such as eyes, noses, eyebrows and cheeks, to name a few. We conclude that learning the shape of the patch not only improves the accuracy of appearance-based part detection but also allows for shape-based part detection. This enables parts of similar appearance but different shapes to be distinguished; for example, while foreheads and cheeks are both skin colored, they have markedly different shapes. 1
4 0.45593601 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
Author: Shai Avidan, Moshe Butman
Abstract: Bob offers a face-detection web service where clients can submit their images for analysis. Alice would very much like to use the service, but is reluctant to reveal the content of her images to Bob. Bob, for his part, is reluctant to release his face detector, as he spent a lot of time, energy and money constructing it. Secure MultiParty computations use cryptographic tools to solve this problem without leaking any information. Unfortunately, these methods are slow to compute and we introduce a couple of machine learning techniques that allow the parties to solve the problem while leaking a controlled amount of information. The first method is an information-bottleneck variant of AdaBoost that lets Bob find a subset of features that are enough for classifying an image patch, but not enough to actually reconstruct it. The second machine learning technique is active learning that allows Alice to construct an online classifier, based on a small number of calls to Bob’s face detector. She can then use her online classifier as a fast rejector before using a cryptographically secure classifier on the remaining image patches. 1
5 0.44876692 174 nips-2006-Similarity by Composition
Author: Oren Boiman, Michal Irani
Abstract: We propose a new approach for measuring similarity between two signals, which is applicable to many machine learning tasks, and to many signal types. We say that a signal S1 is “similar” to a signal S2 if it is “easy” to compose S1 from few large contiguous chunks of S2 . Obviously, if we use small enough pieces, then any signal can be composed of any other. Therefore, the larger those pieces are, the more similar S1 is to S2 . This induces a local similarity score at every point in the signal, based on the size of its supported surrounding region. These local scores can in turn be accumulated in a principled information-theoretic way into a global similarity score of the entire S1 to S2 . “Similarity by Composition” can be applied between pairs of signals, between groups of signals, and also between different portions of the same signal. It can therefore be employed in a wide variety of machine learning problems (clustering, classification, retrieval, segmentation, attention, saliency, labelling, etc.), and can be applied to a wide range of signal types (images, video, audio, biological data, etc.) We show a few such examples. 1
6 0.39173329 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
7 0.37466186 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
8 0.36416677 42 nips-2006-Bayesian Image Super-resolution, Continued
9 0.36050749 18 nips-2006-A selective attention multi--chip system with dynamic synapses and spiking neurons
10 0.35280594 122 nips-2006-Learning to parse images of articulated bodies
11 0.34322521 9 nips-2006-A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
12 0.3354097 120 nips-2006-Learning to Traverse Image Manifolds
13 0.32888305 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
14 0.32453272 66 nips-2006-Detecting Humans via Their Pose
15 0.31751919 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
16 0.3073509 167 nips-2006-Recursive ICA
17 0.30605739 31 nips-2006-Analysis of Contour Motions
18 0.28857228 170 nips-2006-Robotic Grasping of Novel Objects
19 0.28737319 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
20 0.28423887 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex
topicId topicWeight
[(1, 0.121), (3, 0.05), (7, 0.082), (9, 0.046), (12, 0.013), (20, 0.031), (22, 0.046), (28, 0.088), (44, 0.047), (47, 0.014), (57, 0.125), (65, 0.052), (69, 0.042), (71, 0.041), (90, 0.024), (97, 0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.9232468 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
2 0.92241657 86 nips-2006-Graph-Based Visual Saliency
Author: Jonathan Harel, Christof Koch, Pietro Perona
Abstract: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%. 1
3 0.89230514 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
Author: Zhi-hua Zhou, Min-ling Zhang
Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning. Then, we propose the M IML B OOST and M IML S VM algorithms which achieve good performance in an application to scene classification. 1
4 0.88773966 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
Author: Kyung-ah Sohn, Eric P. Xing
Abstract: We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among possibly infinite number of founders and the coalescence-with-mutation events in the resulting genealogies. The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efficient sampling algorithm for HMDP based on a two-level nested P´ lya urn scheme. On both simulated o and real SNP haplotype data, our method performs competitively or significantly better than extant methods in uncovering the recombination hotspots along chromosomal loci; and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations. 1
5 0.84090996 121 nips-2006-Learning to be Bayesian without Supervision
Author: Martin Raphan, Eero P. Simoncelli
Abstract: unkown-abstract
6 0.83644038 34 nips-2006-Approximate Correspondences in High Dimensions
7 0.829512 110 nips-2006-Learning Dense 3D Correspondence
8 0.82905227 42 nips-2006-Bayesian Image Super-resolution, Continued
9 0.82588899 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
10 0.82548147 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
11 0.82493776 154 nips-2006-Optimal Change-Detection and Spiking Neurons
12 0.82459086 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
13 0.82319427 167 nips-2006-Recursive ICA
14 0.82280594 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
15 0.82206416 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
16 0.82111704 185 nips-2006-Subordinate class recognition using relational object models
17 0.8210457 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
18 0.82061338 130 nips-2006-Max-margin classification of incomplete data
19 0.81918603 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
20 0.81527978 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo