nips nips2008 nips2008-66 knowledge-graph by maker-knowledge-mining

66 nips-2008-Dynamic visual attention: searching for coding length increments


Source: pdf

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dynamic Visual Attention: Searching for coding length increments Xiaodi Hou1,2 and Liqing Zhang1 ∗ Department of Computer Science and Engineering, Shanghai Jiao Tong University No. [sent-1, score-0.213]

2 cn 1 Abstract A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. [sent-5, score-0.728]

3 In this paper, a dynamic visual attention model based on the rarity of features is proposed. [sent-6, score-0.636]

4 The objective of our model is to maximize the entropy of the sampled visual features. [sent-8, score-0.288]

5 In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. [sent-9, score-0.306]

6 By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. [sent-10, score-0.697]

7 We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. [sent-11, score-0.856]

8 Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. [sent-12, score-0.511]

9 1 Introduction Visual attention plays an important role in the human visual system. [sent-13, score-0.518]

10 This voluntary mechanism allows us to allocate our sensory and computational resources to the most valuable information embedded in the vast amount of incoming visual data. [sent-14, score-0.289]

11 In the past decade, we have witnessed the success of a number of computational models on visual attention (see [6] for a review). [sent-15, score-0.468]

12 Many of these models analyze static images, and output “saliency maps”, which indicate the probability of eye fixations. [sent-16, score-0.208]

13 Models such as [3] and [4] have tremendously boosted the correlation between eye fixation data and saliency maps. [sent-17, score-0.809]

14 Rather than contributing to the accuracy of saliency map generation, we instead consider alternative approaches to understand visual attention: is there a model that characterizes the ebbs and flows of visual attention? [sent-19, score-1.184]

15 Algorithms simulating saccades in some attention systems [23, 7] are designed for engineering expediency rather than scientific investigation. [sent-21, score-0.348]

16 These algorithms are not intended to cover the full spectrum of dynamic properties of attention, nor to provide a convincing explanation of the continuous nature of attention behaviors. [sent-22, score-0.398]

17 cn/˜zhangliqing In this paper, we present a novel attention model that is intrinsically continuous. [sent-29, score-0.237]

18 Inspired by the principle of predictive coding [9], we use the concept of energy to explain saliency, feature response intensity, and the appropriation of computational resources in one unified framework. [sent-31, score-0.456]

19 The appropriation of energy is based on the Incremental Coding Length, which indicates the rarity of a feature. [sent-32, score-0.182]

20 1 Space and Feature Based Attention Many of the bottom-up visual attention models follow the Koch and Ullman framework [10]. [sent-36, score-0.468]

21 By analyzing feature maps that topographically encode the spatial homogeneity of features, an algorithm can detect the local irregularities of the visual input. [sent-37, score-0.368]

22 This paradigm explains the generation of attention from a one-shot observation of an image. [sent-38, score-0.279]

23 Second, there are attention mechanisms that operate after the generation of saliency, such as attentional modulation [19], and Inhibition of Return (IOR) [8]. [sent-44, score-0.335]

24 In addition to saliency based on local irregularity, recent investigations in V4 and MT cortical areas demonstrate that attention can also be elicited by particular features [13, 18]. [sent-46, score-0.97]

25 In the field of computational models, explorations that are biased by features are also used in task-dependent spatial saliency analysis [16]. [sent-47, score-0.774]

26 The emerging evidence in feature-driven attention has encouraged us to propose a pure feature-based attention model in parallel with the space-based feature map paradigm. [sent-48, score-0.571]

27 2 On the Cause of Attention Finding “irregular patterns” as a criterion for attention is widely used in computational models. [sent-50, score-0.237]

28 In a more rigid form, saliency can be defined by the residuals of Difference of Gaussian filter banks [7], regions with maximal self-information [3], or most discriminant center-surround composition [4]. [sent-51, score-0.686]

29 However, all of these principles do little to address the cause of saliency mechanisms in the brain. [sent-52, score-0.686]

30 At the level of computation, we cannot attribute the formation of attention to functional advantages such as foraging for foods [6]. [sent-53, score-0.237]

31 In this paper, we hypothesize that visual attention is driven by the predictive coding principle, that is, the optimization of metabolic energy consumption in the brain. [sent-54, score-0.786]

32 In our framework, the behavior of attention is explained as a consequence of an actively-searching observer who seeks a more economical neural code to represent the surrounding visual environment. [sent-55, score-0.468]

33 2 The Theory Motivated by the sparse coding strategy [15] discovered in primary visual cortex, we represent an image patch as a linear combination of sparse coding basis functions, which are referred as features. [sent-56, score-0.75]

34 The activity ratio of a feature is its average response to image patches over time and space. [sent-57, score-0.284]

35 The ICL of ith feature is defined as the ensemble’s entropy gain during the activity increment of ith feature. [sent-60, score-0.283]

36 In accordance with the general principle of predictive coding [17], we redistribute energy to features according to their ICL contribution: frequently activated features receive less energy than rarer features. [sent-61, score-0.601]

37 Finally, the saliency of a region is obtained by summing up the activity of all features at that region. [sent-62, score-0.8]

38 1 Sparse Feature Representation Experimental studies [15] have shown that the receptive fields of simple-cells in the primary visual cortex produce a sparse representation. [sent-64, score-0.3]

39 The sparse representation s of an image patch is its response to all filter functions. [sent-77, score-0.208]

40 Considering the energy consumed by neural activity in the brain, this sparse coding strategy is advantageous [11]. [sent-80, score-0.363]

41 In order to evaluate the immediate energy changes in the cortex, some previous work has analyzed the information representation and coding in early visual system [20, 21, 1]. [sent-84, score-0.567]

42 Guided by the insights behind predictive coding [17], we propose the Incremental Coding Length (ICL) as a computational principle based on features. [sent-85, score-0.212]

43 This principle aims to optimize the immediate energy distribution in the system in order to achieve an energyeconomic representation of its environment. [sent-86, score-0.226]

44 The activity ratio pi for ith feature is defined as its relative response level over a sequence of sampling. [sent-87, score-0.378]

45 ], where xk is an vectorized image patch, we can compute the activity ratio pi as: k pi = i | wi xk | . [sent-94, score-0.582]

46 Note that the activity ratio and the energy are abstract values that reflect the statistics of features. [sent-99, score-0.209]

47 Since the visual information is jointly encoded by all features, the most efficient coding strategy should make equal use of all possible feature response levels. [sent-103, score-0.472]

48 We consider a new excitation to feature ˆ i, which will add a variation ε to pi , and change the whole distribution. [sent-107, score-0.228]

49 The new distribution p is: pj = ˆ pj + ε 1+ε , j =i pj 1 + ε, j = i Feature distribuƟon Incremental Coding Length 0. [sent-108, score-0.291]

50 3 ∂H(p) = −H(p) − pi − log pi − pi log pi ∂pi (2) Energy Redistribution ¯ We define the salient feature set S as: S = {i | ICL(pi ) > 0}. [sent-115, score-0.818]

51 In the context of visual attention, the intuition behind the salient feature set is straightforward: A feature is salient only when succeeding activations of that feature can offer entropy gain to the system. [sent-117, score-0.672]

52 The amount of energy received by each feature is denoted di . [sent-119, score-0.176]

53 For features in the salient feature set, let: di = ICL(pi ) ICL(pj ) , (if i ∈ S). [sent-121, score-0.197]

54 , xn ], we can quantify the saliency map M = [m1 , m2 , . [sent-125, score-0.722]

55 4, we notice that the saliency of a patch is not constant. [sent-130, score-0.74]

56 4, we notice that the saliency of a patch may vary over time and space. [sent-133, score-0.74]

57 3 The Experiment We proposed a framework that explains dynamic visual attention as a process that spends limited available energy preferentially on rarely-seen features. [sent-135, score-0.674]

58 In this section, we examine experimentally the behavior of our attention model. [sent-136, score-0.237]

59 1 Static Saliency Map Generation By sequentially sampling over all possible image patches, we calculate the feature distribution of a static image and generate the corresponding saliency map. [sent-138, score-1.002]

60 These maps are then compared with records of eye fixations of human subjects. [sent-139, score-0.239]

61 Even though it is not designed for static saliency map generation, our model achieves the best performance among mainstream approaches. [sent-149, score-0.856]

62 Table 1: Performances on static image saliency Itti et al. [sent-150, score-0.841]

63 Accordingly, at the tth frame, the cumulative activity ratio distribution pt yields: pt = 1 Z t−1 exp( τ =0 τ −t ˆ ) · pτ , λ ˆ where λ is the half life. [sent-164, score-0.232]

64 (5) pt (x)dx is the In video saliency analysis, one of the potential challenges comes from simultaneous movements of the targets and self-movements of the observer. [sent-167, score-0.878]

65 Since our model is feature-based, spatial movements of an object or changing perspectives will not dramatically affect the generation of saliency maps. [sent-168, score-0.808]

66 In order to evaluate the detection accuracy of our approach under changing environment, we compare the dynamic visual attention model with models proposed in [7] and [5]. [sent-169, score-0.559]

67 The efficacy of the saliency maps to a videoclip is determined by comparing the response intensities at saccadic locations and random locations. [sent-171, score-0.848]

68 Ideally, an effective saliency algorithm would have high output at locations gazed by observers, and tend not to response in most of the randomly chosen locations. [sent-172, score-0.757]

69 To quantify this tendency of selectivity, we first compute the distribution of saliency value at human saccadic locations qs and the distribution at random locations qr . [sent-173, score-0.872]

70 Higher the KL divergency is, more easily a model can discriminate human saccadic locations in the image. [sent-175, score-0.183]

71 Given the sequence of generated saliency maps, we can obtain the saliency distribution at human saccade locations (narrow blue bars), and random locations (wide green bars). [sent-194, score-1.534]

72 Reported by researchers in neurobiological experiments, an inhibitory effect was aroused after sustained attention [12]. [sent-198, score-0.299]

73 Research on the cumulative effects of attention [24] has suggested that the dynamics of visual search have broad implications for scene perception, perceptual learning, automaticity, and short term memory. [sent-200, score-0.497]

74 Spontaneous shifts of attention to new visual cues, as well as the “refusal of perception” behavior arise naturally as consequences of our active search model. [sent-203, score-0.532]

75 In order to overcome the physical limitations of the retina, an overt eye movement is made so that the desired visual stimuli can be mapped onto the foveal region. [sent-209, score-0.455]

76 Similar to the computational approximations in [14], we consider the fovea sampling bias as a weighted mask W over the reconstructed saliency map. [sent-210, score-0.753]

77 Let the fovea be located at (x0 , y0 ); the saliency at (x, y) is weighted by W(x, y): 1 W(x, y) = e− 2 (x−x0 )2 +(y−y0 )2 + ξ. [sent-211, score-0.723]

78 2 Overt Eye Movements towards Saliency Targets with Inhibition of Return In the incremental perception of one static image, our dynamic visual system is guided by two factors. [sent-215, score-0.57]

79 The second factor is a foveal structure that allows the system to bias its sampling via overt eye movements. [sent-217, score-0.283]

80 The interplay of these two factors leads to an active visual search behavior that moves towards a maximum entropy equilibrium in the feature distribution. [sent-218, score-0.401]

81 A recently attended visual region is not likely to regain eye fixation within short interval because of the foveated weighting. [sent-220, score-0.423]

82 An implementation of our dynamic visual search is shown in the algorithm box. [sent-222, score-0.351]

83 Given current eye fixation, generate a saliency map with foveal bias. [sent-225, score-0.901]

84 By a saccade, move eye to the global maximum of the saliency map. [sent-227, score-0.809]

85 It is also worth noting that, when run on the images provided by [3], our dynamic visual attention algorithm demonstrates especially pronounced saccades when multiple salient regions are presented in the same image. [sent-233, score-0.791]

86 4 26 91 219 279 1 48 76 98 294 2 11 30 105 137 Figure 5: Results on dynamic visual search 4 Discussions A novel dynamic model of visual attention is described in this paper. [sent-235, score-0.91]

87 We have proposed Incremental Coding Length as a general principle by which to distribute energy in the attention system. [sent-236, score-0.391]

88 In this principle, the salient visual cues correspond to unexpected features - according to the definition of ICL, these features may elicit entropy gain in the perception state and are therefore assigned high energy. [sent-237, score-0.594]

89 To validate this theoretical framework, we have examined experimentally various aspects of visual attention. [sent-238, score-0.231]

90 In experiments comparing with static saliency maps, our model more accurately predicted saccades than did other mainstream models. [sent-239, score-0.931]

91 Because the model updates its state in an online manner, we can consider the statistics of a temporal sequence and our model achieved strong results in video saliency generation. [sent-240, score-0.735]

92 Finally, when feature-based ICL is combined with foveated sampling, our model provides a coherent mechanism for dynamic visual search with inhibition of return. [sent-241, score-0.498]

93 1) In addition to spatial continuity cues, which are demonstrated in other literature, saliency can also be measured using features. [sent-243, score-0.727]

94 2) By incorporating temporal dynamics, a visual attention system can capture a broad range of novel behaviors that have not successfully been explained by saliency map analysis. [sent-244, score-1.267]

95 And 3) dynamic attention behaviors might quantitatively be explained and simulated by the pursuit of a maximum entropy equilibrium in the state of perception. [sent-245, score-0.456]

96 A model of saliency-based visual attention for rapid scene analysis. [sent-287, score-0.468]

97 Shifts in selective visual attention: towards the underlying neural circuitry. [sent-301, score-0.259]

98 Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptive-field effects. [sent-335, score-0.38]

99 Attentional modulation of visual motion processing in cortical areas MT and MST. [sent-346, score-0.231]

100 Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. [sent-369, score-0.468]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('saliency', 0.686), ('icl', 0.278), ('attention', 0.237), ('visual', 0.231), ('pi', 0.167), ('coding', 0.149), ('eye', 0.123), ('energy', 0.115), ('ior', 0.111), ('saccades', 0.111), ('pj', 0.097), ('dynamic', 0.091), ('salient', 0.089), ('static', 0.085), ('itti', 0.081), ('xation', 0.079), ('inhibition', 0.076), ('image', 0.07), ('pt', 0.069), ('incremental', 0.069), ('activity', 0.067), ('bruce', 0.065), ('perception', 0.065), ('koch', 0.063), ('feature', 0.061), ('entropy', 0.057), ('foveal', 0.056), ('ons', 0.056), ('saccadic', 0.056), ('attentional', 0.056), ('patch', 0.054), ('human', 0.05), ('video', 0.049), ('mainstream', 0.049), ('behaviors', 0.048), ('features', 0.047), ('overt', 0.045), ('generation', 0.042), ('spatial', 0.041), ('locations', 0.04), ('principle', 0.039), ('movements', 0.039), ('xa', 0.038), ('appropriation', 0.037), ('convincing', 0.037), ('divergency', 0.037), ('fovea', 0.037), ('foveated', 0.037), ('walther', 0.037), ('xhou', 0.037), ('kl', 0.037), ('cortex', 0.037), ('length', 0.036), ('inhibitory', 0.036), ('map', 0.036), ('maps', 0.035), ('targets', 0.035), ('shifts', 0.035), ('cues', 0.035), ('mechanism', 0.034), ('activated', 0.033), ('china', 0.033), ('basis', 0.033), ('nature', 0.033), ('attended', 0.032), ('redistribute', 0.032), ('saccade', 0.032), ('sparse', 0.032), ('images', 0.032), ('response', 0.031), ('records', 0.031), ('lter', 0.03), ('sampling', 0.03), ('retina', 0.03), ('consumption', 0.03), ('rarity', 0.03), ('vectorized', 0.03), ('search', 0.029), ('system', 0.029), ('selective', 0.028), ('increments', 0.028), ('swing', 0.028), ('patches', 0.028), ('xk', 0.027), ('ratio', 0.027), ('sustained', 0.026), ('gao', 0.026), ('increment', 0.025), ('ith', 0.025), ('sensory', 0.024), ('predictive', 0.024), ('vision', 0.024), ('equilibrium', 0.023), ('selectivity', 0.023), ('neuroscience', 0.023), ('gain', 0.023), ('immediate', 0.022), ('mt', 0.021), ('representation', 0.021), ('return', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

2 0.1095089 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

3 0.10500816 62 nips-2008-Differentiable Sparse Coding

Author: J. A. Bagnell, David M. Bradley

Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1

4 0.079382278 124 nips-2008-Load and Attentional Bayes

Author: Peter Dayan

Abstract: Selective attention is a most intensively studied psychological phenomenon, rife with theoretical suggestions and schisms. A critical idea is that of limited capacity, the allocation of which has produced continual conflict about such phenomena as early and late selection. An influential resolution of this debate is based on the notion of perceptual load (Lavie, 2005), which suggests that low-load, easy tasks, because they underuse the total capacity of attention, mandatorily lead to the processing of stimuli that are irrelevant to the current attentional set; whereas high-load, difficult tasks grab all resources for themselves, leaving distractors high and dry. We argue that this theory presents a challenge to Bayesian theories of attention, and suggest an alternative, statistical, account of key supporting data. 1

5 0.069692306 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation

Author: Adrian Haith, Carl P. Jackson, R. C. Miall, Sethu Vijayakumar

Abstract: Adaptation of visually guided reaching movements in novel visuomotor environments (e.g. wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational models of the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual and proprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptation as well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation will elicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment. 1

6 0.065306239 38 nips-2008-Bio-inspired Real Time Sensory Map Realignment in a Robotic Barn Owl

7 0.061928775 119 nips-2008-Learning a discriminative hidden part model for human action recognition

8 0.060224202 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

9 0.05873942 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

10 0.058474742 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

11 0.05589522 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

12 0.054867037 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

13 0.05478251 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

14 0.0525213 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction

15 0.05199505 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

16 0.049388219 23 nips-2008-An ideal observer model of infant object perception

17 0.049245127 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

18 0.048884016 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

19 0.048432074 231 nips-2008-Temporal Dynamics of Cognitive Control

20 0.047471814 206 nips-2008-Sequential effects: Superstition or rational behavior?


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.151), (1, -0.008), (2, 0.113), (3, -0.023), (4, 0.002), (5, 0.013), (6, -0.071), (7, -0.04), (8, 0.082), (9, 0.062), (10, -0.015), (11, 0.02), (12, -0.038), (13, 0.062), (14, -0.017), (15, -0.033), (16, 0.001), (17, -0.007), (18, -0.038), (19, -0.035), (20, -0.011), (21, -0.0), (22, -0.055), (23, -0.037), (24, -0.049), (25, -0.059), (26, 0.065), (27, -0.034), (28, -0.06), (29, -0.066), (30, -0.028), (31, 0.042), (32, 0.002), (33, 0.037), (34, 0.013), (35, -0.023), (36, -0.049), (37, 0.025), (38, 0.061), (39, -0.057), (40, 0.067), (41, -0.049), (42, -0.01), (43, 0.004), (44, -0.013), (45, -0.044), (46, -0.146), (47, 0.175), (48, 0.166), (49, -0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93998712 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

2 0.578996 38 nips-2008-Bio-inspired Real Time Sensory Map Realignment in a Robotic Barn Owl

Author: Juan Huo, Zhijun Yang, Alan F. Murray

Abstract: The visual and auditory map alignment in the Superior Colliculus (SC) of barn owl is important for its accurate localization for prey behavior. Prism learning or Blindness may interfere this alignment and cause loss of the capability of accurate prey. However, juvenile barn owl could recover its sensory map alignment by shifting its auditory map. The adaptation of this map alignment is believed based on activity dependent axon developing in Inferior Colliculus (IC). A model is built to explore this mechanism. In this model, axon growing process is instructed by an inhibitory network in SC while the strength of the inhibition adjusted by Spike Timing Dependent Plasticity (STDP). We test and analyze this mechanism by application of the neural structures involved in spatial localization in a robotic system. 1

3 0.5718807 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

4 0.56766605 124 nips-2008-Load and Attentional Bayes

Author: Peter Dayan

Abstract: Selective attention is a most intensively studied psychological phenomenon, rife with theoretical suggestions and schisms. A critical idea is that of limited capacity, the allocation of which has produced continual conflict about such phenomena as early and late selection. An influential resolution of this debate is based on the notion of perceptual load (Lavie, 2005), which suggests that low-load, easy tasks, because they underuse the total capacity of attention, mandatorily lead to the processing of stimuli that are irrelevant to the current attentional set; whereas high-load, difficult tasks grab all resources for themselves, leaving distractors high and dry. We argue that this theory presents a challenge to Bayesian theories of attention, and suggest an alternative, statistical, account of key supporting data. 1

5 0.47432709 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues

Author: Rama Natarajan, Iain Murray, Ladan Shams, Richard S. Zemel

Abstract: We explore a recently proposed mixture model approach to understanding interactions between conflicting sensory cues. Alternative model formulations, differing in their sensory noise models and inference methods, are compared based on their fit to experimental data. Heavy-tailed sensory likelihoods yield a better description of the subjects’ response behavior than standard Gaussian noise models. We study the underlying cause for this result, and then present several testable predictions of these models. 1

6 0.47197539 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

7 0.46722883 148 nips-2008-Natural Image Denoising with Convolutional Networks

8 0.42503908 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

9 0.40456936 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

10 0.40260184 23 nips-2008-An ideal observer model of infant object perception

11 0.396189 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

12 0.39417791 62 nips-2008-Differentiable Sparse Coding

13 0.39000422 119 nips-2008-Learning a discriminative hidden part model for human action recognition

14 0.38463321 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns

15 0.37438536 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation

16 0.3717545 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

17 0.35946694 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

18 0.35797706 7 nips-2008-A computational model of hippocampal function in trace conditioning

19 0.35368195 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

20 0.34663725 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.019), (5, 0.232), (6, 0.051), (7, 0.094), (12, 0.057), (15, 0.018), (28, 0.146), (57, 0.105), (59, 0.029), (63, 0.023), (71, 0.024), (77, 0.032), (78, 0.025), (83, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82622212 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

2 0.80811483 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method

Author: Dongryeol Lee, Alexander G. Gray

Abstract: We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accuracy. First, we extend the original fast multipole-type methods to use approximation schemes with both hard and probabilistic error. Second, we utilize a new data structure called subspace tree which maps each data point in the node to its lower dimensional mapping as determined by any linear dimension reduction method such as PCA. This new data structure is suitable for reducing the cost of each pairwise distance computation, the most dominant cost in many kernel methods. Our algorithm guarantees probabilistic relative error on each kernel sum, and can be applied to high-dimensional Gaussian summations which are ubiquitous inside many kernel methods as the key computational bottleneck. We provide empirical speedup results on low to high-dimensional datasets up to 89 dimensions. 1 Fast Gaussian Kernel Summation In this paper, we propose new computational techniques for efficiently approximating the following sum for each query point qi ∈ Q: Φ(qi , R) = e−||qi −rj || 2 /(2h2 ) (1) rj ∈R where R is the reference set; each reference point is associated with a Gaussian function with a smoothing parameter h (the ’bandwidth’). This form of summation is ubiquitous in many statistical learning methods, including kernel density estimation, kernel regression, Gaussian process regression, radial basis function networks, spectral clustering, support vector machines, and kernel PCA [1, 4]. Cross-validation in all of these methods require evaluating Equation 1 for multiple values of h. Kernel density estimation, for example, requires |R| density estimate based on |R| − 1 points, yielding a brute-force computational cost scaling quadratically (that is O(|R| 2 )). Error bounds. Due to its expensive computational cost, many algorithms approximate the Gaussian kernel sums at the expense of reduced precision. Therefore, it is natural to discuss error bound criteria which measure the quality of the approximations with respect to their corresponding true values. The following error bound criteria are common in literature: Definition 1.1. An algorithm guarantees absolute error bound, if for each exact value Φ(q i , R) for qi ∈ Q, it computes Φ(qi , R) such that Φ(qi , R) − Φ(qi , R) ≤ . Definition 1.2. An algorithm guarantees relative error bound, if for each exact value Φ(q i , R) for qi ∈ Q, it computes Φ(qi , R) ∈ R such that Φ(qi , R) − Φ(qi , R) ≤ |Φ(qi , R)|. 1 Bounding the relative error (e.g., the percentage deviation) is much harder because the error bound criterion is in terms of the initially unknown exact quantity. As a result, many previous methods [7] have focused on bounding the absolute error. The relative error bound criterion is preferred to the absolute error bound criterion in statistical applications in which high accuracy is desired. Our new algorithm will enforce the following “relaxed” form of the relative error bound criterion, whose motivation will be discussed shortly. Definition 1.3. An algorithm guarantees (1 − α) probabilistic relative error bound, if for each exact value Φ(qi , R) for qi ∈ Q, it computes Φ(qi , R) ∈ R, such that with at least probability 0 < 1 − α < 1, Φ(qi , R) − Φ(qi , R) ≤ |Φ(qi , R)|. Previous work. The most successful class of acceleration methods employ “higher-order divide and conquer” or generalized N -body algorithms (GNA) [4]. This approach can use any spatial partioning tree such as kd-trees or ball-trees for both the query set Q and reference data R and performs a simulataneous recursive descent on both trees. GNA with relative error bounds (Definition 1.2) [5, 6, 11, 10] utilized bounding boxes and additional cached-sufficient statistics such as higher-order moments needed for series-expansion. [5, 6] utilized bounding-box based error bounds which tend to be very loose, which resulted in slow empirical performance around suboptimally small and large bandwidths. [11, 10] extended GNA-based Gaussian summations with series-expansion which provided tighter bounds; it showed enormous performance improvements, but only up to low dimensional settings (up to D = 5) since the number of required terms in series expansion increases exponentially with respect to D. [9] introduces an iterative sampling based GNA for accelerating the computation of nested sums (a related easier problem). Its speedup is achieved by replacing pessimistic error bounds provided by bounding boxes with normal-based confidence interval from Monte Carlo sampling. [9] demonstrates the speedup many orders of magnitude faster than the previous state of the art in the context of computing aggregates over the queries (such as the LSCV score for selecting the optimal bandwidth). However, the authors did not discuss the sampling-based approach for computations that require per-query estimates, such as those required for kernel density estimation. None of the previous approaches for kernel summations addresses the issue of reducing the computational cost of each distance computation which incurs O(D) cost. However, the intrinsic dimensionality d of most high-dimensional datasets is much smaller than the explicit dimension D (that is, d << D). [12] proposed tree structures using a global dimension reduction method, such as random projection, as a preprocessing step for efficient (1 + ) approximate nearest neighbor search. Similarly, we develop a new data structure for kernel summations; our new data structure is constructed in a top-down fashion to perform the initial spatial partitioning in the original input space R D and performs a local dimension reduction to a localized subset of the data in a bottom-up fashion. This paper. We propose a new fast Gaussian summation algorithm that enables speedup in higher dimensions. Our approach utilizes: 1) probabilistic relative error bounds (Definition 1.3) on kernel sums provided by Monte Carlo estimates 2) a new tree structure called subspace tree for reducing the computational cost of each distance computation. The former can be seen as relaxing the strict requirement of guaranteeing hard relative bound on very small quantities, as done in [5, 6, 11, 10]. The latter was mentioned as a possible way of ameliorating the effects of the curse of dimensionality in [14], a pioneering paper in this area. Notations. Each query point and reference point (a D-dimensional vector) is indexed by natural numbers i, j ∈ N, and denoted qi and rj respectively. For any set S, |S| denotes the number of elements in S. The entities related to the left and the right child are denoted with superscripts L and R; an internal node N has the child nodes N L and N R . 2 Gaussian Summation by Monte Carlo Sampling Here we describe the extension needed for probabilistic computation of kernel summation satisfying Definition 1.3. The main routine for the probabilistic kernel summation is shown in Algorithm 1. The function MCMM takes the query node Q and the reference node R (each initially called with the roots of the query tree and the reference tree, Qroot and Rroot ) and β (initially called with α value which controls the probability guarantee that each kernel sum is within relative error). 2 Algorithm 1 The core dual-tree routine for probabilistic Gaussian kernel summation. MCMM(Q, R, β) if C AN S UMMARIZE E XACT(Q, R, ) then S UMMARIZE E XACT(Q, R) else if C AN S UMMARIZE MC(Q, R, , β) then 5: S UMMARIZE MC(Q, R, , β) else if Q is a leaf node then if R is a leaf node then MCMMBASE(Q, R) 10: else MCMM Q, RL , β , MCMM Q, RR , β 2 2 else if R is a leaf node then MCMM(QL , R, β), MCMM(QR , R, β) 15: else MCMM QL , RL , β , MCMM QL , RR , β 2 2 MCMM QR , RL , β , MCMM QR , RR , β 2 2 The idea of Monte Carlo sampling used in the new algorithm is similar to the one in [9], except the sampling is done per query and we use approximations that provide hard error bounds as well (i.e. finite difference, exhaustive base case: MCMMBASE). This means that the approximation has less variance than a pure Monte Carlo approach used in [9]. Algorithm 1 first attempts approximations with hard error bounds, which are computationally cheaper than sampling-based approximations. For example, finite-difference scheme [5, 6] can be used for the C AN S UMMARIZE E XACT and S UMMARIZE E XACT functions in any general dimension. The C AN S UMMARIZE MC function takes two parameters that specify the accuracy: the relative error and its probability guarantee and decides whether to use Monte Carlo sampling for the given pair of nodes. If the reference node R contains too few points, it may be more efficient to process it using exact methods that use error bounds based on bounding primitives on the node pair or exhaustive pair-wise evaluations, which is determined by the condition: τ · minitial ≤ |R| where τ > 1 controls the minimum number of reference points needed for Monte Carlo sampling to proceed. If the reference node does contain enough points, then for each query point q ∈ Q, the S AMPLE routine samples minitial terms over the terms in the summation Φ(q, R) = Kh (||q − rjn ||) rjn ∈R where Φ(q, R) denotes the exact contribution of R to q’s kernel sum. Basically, we are interested in estimating Φ(q, R) by Φ(q, R) = |R|µS , where µS is the sample mean of S. From the Central 2 Limit Theorem, given enough m samples, µS N (µ, σS /m) where Φ(q, R) = |R|µ (i.e. µ is the average of the kernel value between q and any reference point r ∈ R); this implies that √ |µS − µ| ≤ zβ/2 σS / m with probability 1 − β. The pruning rule we have to enforce for each query point for the contribution of R is: σS Φ(q, R) zβ/2 √ ≤ |R| m where σS the sample standard deviation of S. Since Φ(q, R) is one of the unknown quanities we want to compute, we instead enforce the following: σS zβ/2 √ ≤ m Φl (q, R) + |R| µS − |R| zβ/2 σS √ m (2) where Φl (q, R) is the currently running lower bound on the sum computed using exact methods z σ √ and |R| µS − β/2 S is the probabilistic component contributed by R. Denoting Φl,new (q, R) = m zβ/2 σ Φl (q, R) + |R| µS − √ S , the minimum number of samples for q needed to achieve the |S| 3 target error the right side of the inequality in Equation 2 with at least probability of 1 − β is: 2 2 m ≥ zβ/2 σS (|R| + |R|)2 2 (Φl (q, R) + |R|µ )2 S If the given query node and reference node pair cannot be pruned using either nonprobabilistic/probabilistic approximations, then we recurse on a smaller subsets of two sets. In particular, when dividing over the reference node R, we recurse with half of the β value 1 . We now state the probablistic error guarantee of our algorithm as a theorem. Theorem 2.1. After calling MCMM with Q = Qroot , R = Rroot , and β = α, Algorithm 1 approximates each Φ(q, R) with Φ(q, R) such that Definition 1.3 holds. Proof. For a query/reference (Q, R) pair and 0 < β < 1, MCMMBASE and S UMMARIZE E XACT compute estimates for q ∈ Q such that Φ(q, R) − Φ(q, R) < Φ(q,R)|R| with probability at |R| least 1 > 1 − β. By Equation 2, S UMMARIZE MC computes estimates for q ∈ Q such that Φ(q, R) − Φ(q, R) < Φ(q,R)|R| with probability 1 − β. |R| We now induct on |Q ∪ R|. Line 11 of Algorithm 1 divides over the reference whose subcalls compute estimates that satisfy Φ(q, RL ) − Φ(q, RL ) ≤ Φ(q,R)|RR | |R| L each with at least 1 − β 2 Φ(q,R)|RL | |R| and Φ(q, RR ) − Φ(q, RR ) ≤ probability by induction hypothesis. For q ∈ Q, Φ(q, R) = Φ(q, R )+ Φ(q, R ) which means |Φ(q, R)−Φ(q, R)| ≤ Φ(q,R)|R| with probability at least 1−β. |R| Line 14 divides over the query and each subcall computes estimates that hold with at least probability 1 − β for q ∈ QL and q ∈ QR . Line 16 and 17 divides both over the query and the reference, and the correctness can be proven similarly. Therefore, M CM M (Qroot , Rroot , α) computes estimates satisfying Definition 1.3. R “Reclaiming” probability. We note that the assigned probability β for the query/reference pair computed with exact bounds (S UMMARIZE E XACT and MCMMBASE) is not used. This portion of the probability can be “reclaimed” in a similar fashion as done in [10] and re-used to prune more aggressively in the later stages of the algorithm. All experiments presented in this paper were benefited by this simple modification. 3 Subspace Tree A subspace tree is basically a space-partitioning tree with a set of orthogonal bases associated with each node N : N.Ω = (µ, U, Λ, d) where µ is the mean, U is a D×d matrix whose columns consist of d eigenvectors, and Λ the corresponding eigenvalues. The orthogonal basis set is constructed using a linear dimension reduction method such as PCA. It is constructed in the top-down manner using the PARTITION S ET function dividing the given set of points into two (where the PARTITION S ET function divides along the dimension with the highest variance in case of a kd-tree for example), with the subspace in each node formed in the bottom-up manner. Algorithm 3 shows a PCA tree (a subspace tree using PCA as a dimension reduction) for a 3-D dataset. The subspace of each leaf node is computed using P CA BASE which can use the exact PCA [3] or a stochastic one [2]. For an internal node, the subspaces of the child nodes, N L .Ω = (µL , U L , ΛL , dL ) and N R .Ω = (µR , U R , ΛR , dR ), are approximately merged using the M ERGE S UBSPACES function which involves solving an (d L + dR + 1) × (dL + dR + 1) eigenvalue problem [8], which runs in O((dL + dR + 1)3 ) << O(D 3 ) given that the dataset is sparse. In addition, each data point x in each node N is mapped to its new lower-dimensional coordinate using the orthogonal basis set of N : xproj = U T (x − µ). The L2 norm reconstruction error is given by: ||xrecon − x||2 = ||(U xproj + µ) − x||2 . 2 2 Monte Carlo sampling using a subspace tree. Consider C AN S UMMARIZE MC function in Algorithm 2. The “outer-loop” over this algorithm is over the query set Q, and it would make sense to project each query point q ∈ Q to the subspace owned by the reference node R. Let U and µ be the orthogonal basis system for R consisting of d basis. For each q ∈ Q, consider the squared distance 1 We could also divide β such that the node that may be harder to approximate gets a lower value. 4 Algorithm 2 Monte Carlo sampling based approximation routines. C AN S UMMARIZE MC(Q, R, , α) S AMPLE(q, R, , α, S, m) return τ · minitial ≤ |R| for k = 1 to m do r ← random point in R S UMMARIZE MC(Q, R, , α) S ← S ∪ {Kh (||q − r||)} 2 for qi ∈ Q do µS ← M EAN(S), σS ← VARIANCE(S) S ← ∅, m ← minitial zα/2 σS Φl,new (q, R) ← Φl (q, R) + |R| µS − √ repeat |S| 2 S AMPLE(qi , R, , α, S, m) 2 2 mthresh ← zα/2 σS 2 (Φ(|R|+ |R|) S )2 l (q,R)+|R|µ until m ≤ 0 m ← mthresh − |S| Φ(qi , R) ← Φ(qi , R) + |R| · M EAN(S) ||(q − µ) − rproj ||2 (where (q − µ) is q’s coordinates expressed in terms of the coordinate system of R) as shown in Figure 1. For the Gaussian kernel, each pairwise kernel value is approximated as: e−||q−r|| 2 /(2h2 ) ≈ e−||q−qrecon || 2 (3) /(2h2 ) −||qproj −rproj ||2 /(2h2 ) e where qrecon = U qproj +µ and qproj = U (q−µ). For a fixed query point q, e can be precomputed (which takes d dot products between two D-dimensional vectors) and re-used for every distance computation between q and any reference point r ∈ R whose cost is now O(d) << O(D). Therefore, we can take more samples efficiently. For a total of sufficiently large m samples, the computational cost is O(d(D + m)) << O(D · m) for each query point. −||q−qrecon ||2 /(2h2 ) T Increased variance comes at the cost of inexact distance computations, however. Each distance computation incurs at most squared L2 norm of ||rrecon − r||2 error. That is, 2 ||q − rrecon ||2 − ||q − r||2 ≤ ||rrecon − r||2 . Neverhteless, the sample variance for each query 2 2 2 point plus the inexactness due to dimension reduction τS can be shown to be bounded for the Gaus2 2 sian kernel as: (where each s = e−||q−rrecon || /(2h ) ): 1 m−1 ≤ 1 m−1 s∈S s2 − m · µ 2 S s2 + τS 2 min 1, max e||rrecon −r||2 /h s∈S r∈R 2 2 − m µS min e−||rrecon −r||2 /(2h 2 2 ) r∈R Exhaustive computations using a subspace tree. Now suppose we have built subspace trees for the query and the reference sets. We can project either each query point onto the reference subspace, or each reference point onto the query subspace, depending on which subspace has a smaller dimension and the number of points in each node. The subspaces formed in the leaf nodes usually are highly numerically accurate since it contains very few points compared to the extrinsic dimensionality D. 4 Experimental Results We empirically evaluated the runtime performance of our algorithm on seven real-world datasets, scaled to fit in [0, 1]D hypercube, for approximating the Gaussian sum at every query point with a range of bandwidths. This experiment is motivated by many kernel methods that require computing the Gaussian sum at different bandwidth values (according to the standard least-sqares crossvalidation scores [15]). Nevertheless, we emphasize that the acceleration results are applicable to other kernel methods that require efficient Gaussian summation. In this paper, the reference set equals the query set. All datasets have 50K points so that the exact exhaustive method can be tractably computed. All times are in seconds and include the time needed to build the trees. Codes are in C/C++ and run on a dual Intel Xeon 3GHz with 8 Gb of main memory. The measurements in second to eigth columns are obtained by running the algorithms at the bandwidth kh∗ where 10−3 ≤ k ≤ 103 is the constant in the corresponding column header. The last columns denote the total time needed to run on all seven bandwidth values. Each table has results for five algorithms: the naive algorithm and four algorithms. The algorithms with p = 1 denote the previous state-of-the-art (finite-difference with error redistribution) [10], 5 Algorithm 3 PCA tree building routine. B UILD P CAT REE(P) if C AN PARTITION(P) then {P L , P R } ← PARTITION S ET(P) N ← empty node N L ← B UILD P CAT REE(P L ) N R ← B UILD P CAT REE(P R ) N.S ← M ERGE S UBSPACES(N L .S, N R .S) else N ← B UILD P CAT REE BASE(P) N.S ← P CA BASE(P) N.Pproj ← P ROJECT(P, N.S) return N while those with p < 1 denote our probabilistic version. Each entry has the running time and the percentage of the query points that did not satisfy the relative error . Analysis. Readers should focus on the last columns containing the total time needed for evaluating Gaussian sum at all points for seven different bandwidth values. This is indicated by boldfaced numbers for our probabilistic algorithm. As expected, On low-dimensional datasets (below 6 dimensions), the algorithm using series-expansion based bounds gives two to three times speedup compared to our approach that uses Monte Carlo sampling. Multipole moments are an effective form of compression in low dimensions with analytical error bounds that can be evaluated; our Monte Carlo-based method has an asymptotic error bound which must be “learned” through sampling. As we go from 7 dimensions and beyond, series-expansion cannot be done efficiently because of its slow convergence. Our probabilistic algorithm (p = 0.9) using Monte Carlo consistently performs better than the algorithm using exact bounds (p = 1) by at least a factor of two. Compared to naive, it achieves the maximum speedup of about nine times on an 16-dimensional dataset; on an 89-dimensional dataset, it is at least three times as fast as the naive. Note that all the datasets contain only 50K points, and the speedup will be more dramatic as we increase the number of points. 5 Conclusion We presented an extension to fast multipole methods to use approximation methods with both hard and probabilistic bounds. Our experimental results show speedup over the previous state-of-the-art on high-dimensional datasets. Our future work will include possible improvements inspired by a recent work done in the FMM community using a matrix-factorization formulation [13]. Figure 1: Left: A PCA-tree for a 3-D dataset. Right: The squared Euclidean distance between a given query point and a reference point projected onto a subspace can be decomposed into two components: the orthogonal component and the component in the subspace. 6 Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 Σ mockgalaxy-D-1M-rnd (cosmology: positions), D = 3, N = 50000, h ∗ = 0.000768201 Naive 182 182 182 182 182 182 182 1274 MCMM 3 3 5 10 26 48 2 97 ( = 0.1, p = 0.9) 1% 1% 1% 1% 1% 1% 5% DFGT 2 2 2 2 6 19 3 36 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 3 3 4 11 27 58 21 127 ( = 0.01, p = 0.9) 0 % 0% 1% 1% 1% 1% 7% DFGT 2 2 2 2 7 30 5 50 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 Σ bio5-rnd (biology: drug activity), D = 5, N = 50000, h∗ = 0.000567161 Naive 214 214 214 214 214 214 214 1498 MCMM 4 4 6 144 149 65 1 373 ( = 0.1, p = 0.9) 0% 0% 0% 0% 1% 0% 1% DFGT 4 4 5 24 96 65 2 200 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 4 4 6 148 165 126 1 454 ( = 0.01, p = 0.9) 0 % 0% 0% 0% 1% 0% 1% DFGT 4 4 5 25 139 126 4 307 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 Σ pall7 − rnd , D = 7, N = 50000, h∗ = 0.00131865 Naive 327 327 327 327 327 327 327 2289 MCMM 3 3 3 3 63 224 <1 300 ( = 0.1, p = 0.9) 0% 0% 0% 1% 1% 12 % 0% DFGT 10 10 11 14 84 263 223 615 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 3 3 3 3 70 265 5 352 ( = 0.01, p = 0.9) 0 % 0% 0% 1% 2% 1% 8% DFGT 10 10 11 14 85 299 374 803 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 Σ covtype − rnd , D = 10, N = 50000, h∗ = 0.0154758 Naive 380 380 380 380 380 380 380 2660 MCMM 11 11 13 39 318 <1 <1 381 ( = 0.1, p = 0.9) 0% 0% 0% 1% 0% 0% 0% DFGT 26 27 38 177 390 244 <1 903 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 11 11 13 77 362 2 <1 477 ( = 0.01, p = 0.9) 0 % 0% 0% 1% 1% 10 % 0% DFGT 26 27 38 180 427 416 <1 1115 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 Σ CoocTexture − rnd , D = 16, N = 50000, h∗ = 0.0263958 Naive 472 472 472 472 472 472 472 3304 MCMM 10 11 22 189 109 <1 <1 343 ( = 0.1, p = 0.9) 0% 0% 0% 1% 8% 0% 0% DFGT 22 26 82 240 452 66 <1 889 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 10 11 22 204 285 <1 <1 534 ( = 0.01, p = 0.9) 0 % 0% 1% 1% 10 % 4% 0% DFGT 22 26 83 254 543 230 <1 1159 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% 7 Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 LayoutHistogram − rnd , D = 32, N = 50000, h∗ = 0.0609892 Naive 757 757 757 757 757 757 757 MCMM 32 32 54 168 583 8 8 ( = 0.1, p = 0.9) 0% 0% 1% 1% 1% 0% 0% DFGT 153 159 221 492 849 212 <1 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 32 45 60 183 858 8 8 ( = 0.01, p = 0.9) 0 % 0% 1% 6% 1% 0% 0% DFGT 153 159 222 503 888 659 <1 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Algorithm \ scale 0.001 0.01 0.1 1 10 100 1000 CorelCombined − rnd , D = 89, N = 50000, h∗ = 0.0512583 Naive 1716 1716 1716 1716 1716 1716 1716 MCMM 384 418 575 428 1679 17 17 ( = 0.1, p = 0.9) 0% 0% 0% 1% 10 % 0% 0% DFGT 659 677 864 1397 1772 836 17 ( = 0.1, p = 1) 0% 0% 0% 0% 0% 0% 0% MCMM 401 419 575 437 1905 17 17 ( = 0.01, p = 0.9) 0 % 0% 0% 1% 2% 0% 0% DFGT 659 677 865 1425 1794 1649 17 ( = 0.01, p = 1) 0% 0% 0% 0% 0% 0% 0% Σ 5299 885 2087 1246 2585 Σ 12012 3518 6205 3771 7086 References [1] Nando de Freitas, Yang Wang, Maryam Mahdaviani, and Dustin Lang. Fast krylov methods for n-body learning. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing o Systems 18, pages 251–258. MIT Press, Cambridge, MA, 2006. [2] P. Drineas, R. Kannan, and M. Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition, 2004. [3] G. Golub. Matrix Computations, Third Edition. The Johns Hopkins University Press, 1996. [4] A. Gray and A. W. Moore. N-Body Problems in Statistical Learning. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13 (December 2000). MIT Press, 2001. [5] Alexander G. Gray and Andrew W. Moore. Nonparametric Density Estimation: Toward Computational Tractability. In SIAM International Conference on Data Mining 2003, 2003. [6] Alexander G. Gray and Andrew W. Moore. Very Fast Multivariate Kernel Density Estimation via Computational Geometry. In Joint Statistical Meeting 2003, 2003. to be submitted to JASA. [7] L. Greengard and J. Strain. The Fast Gauss Transform. SIAM Journal of Scientific and Statistical Computing, 12(1):79–94, 1991. [8] Peter Hall, David Marshall, and Ralph Martin. Merging and splitting eigenspace models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(9):1042–1049, 2000. [9] Michael Holmes, Alexander Gray, and Charles Isbell. Ultrafast monte carlo for statistical summations. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 673–680. MIT Press, Cambridge, MA, 2008. [10] Dongryeol Lee and Alexander Gray. Faster gaussian summation: Theory and experiment. In Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence. 2006. [11] Dongryeol Lee, Alexander Gray, and Andrew Moore. Dual-tree fast gauss transforms. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 747– o 754. MIT Press, Cambridge, MA, 2006. [12] Ting Liu, Andrew W. Moore, and Alexander Gray. Efficient exact k-nn and nonparametric classification ¨ in high dimensions. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch olkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. [13] P. G. Martinsson and Vladimir Rokhlin. An accelerated kernel-independent fast multipole method in one dimension. SIAM J. Scientific Computing, 29(3):1160–1178, 2007. [14] A. W. Moore, J. Schneider, and K. Deng. Efficient locally weighted polynomial regression predictions. In D. Fisher, editor, Proceedings of the Fourteenth International Conference on Machine Learning, pages 196–204, San Francisco, 1997. Morgan Kaufmann. [15] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC, 1986. 8

3 0.71861905 112 nips-2008-Kernel Measures of Independence for non-iid Data

Author: Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola

Abstract: Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion. In this paper, we extend this criterion to deal with structured and interdependent observations. This is achieved by modeling the structures using undirected graphical models and comparing the Hilbert space embeddings of distributions. We apply this new criterion to independent component analysis and sequence clustering. 1

4 0.65931815 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

Author: Mehmet K. Muezzinoglu, Alexander Vergara, Ramon Huerta, Thomas Nowotny, Nikolai Rulkov, Henry Abarbanel, Allen Selverston, Mikhail Rabinovich

Abstract: The odor transduction process has a large time constant and is susceptible to various types of noise. Therefore, the olfactory code at the sensor/receptor level is in general a slow and highly variable indicator of the input odor in both natural and artificial situations. Insects overcome this problem by using a neuronal device in their Antennal Lobe (AL), which transforms the identity code of olfactory receptors to a spatio-temporal code. This transformation improves the decision of the Mushroom Bodies (MBs), the subsequent classifier, in both speed and accuracy. Here we propose a rate model based on two intrinsic mechanisms in the insect AL, namely integration and inhibition. Then we present a MB classifier model that resembles the sparse and random structure of insect MB. A local Hebbian learning procedure governs the plasticity in the model. These formulations not only help to understand the signal conditioning and classification methods of insect olfactory systems, but also can be leveraged in synthetic problems. Among them, we consider here the discrimination of odor mixtures from pure odors. We show on a set of records from metal-oxide gas sensors that the cascade of these two new models facilitates fast and accurate discrimination of even highly imbalanced mixtures from pure odors. 1

5 0.65871626 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

6 0.65811872 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

7 0.65640616 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

8 0.65580809 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

9 0.65537941 200 nips-2008-Robust Kernel Principal Component Analysis

10 0.65478015 62 nips-2008-Differentiable Sparse Coding

11 0.6532107 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

12 0.6531533 4 nips-2008-A Scalable Hierarchical Distributed Language Model

13 0.65282851 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

14 0.65059578 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

15 0.65011513 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

16 0.65009403 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

17 0.64715827 138 nips-2008-Modeling human function learning with Gaussian processes

18 0.6471293 219 nips-2008-Spectral Hashing

19 0.64598358 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

20 0.64594924 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks