nips nips2005 nips2005-130 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein
Abstract: Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity models which are derived from fMRI data are then validated through a group classification task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically MultiLinked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. control subjects. This finding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function.
Reference: text
sentIndex sentText sentNum sentScore
1 However, interactivity between functional brain regions, is still little studied. [sent-2, score-0.607]
2 In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. [sent-3, score-0.675]
3 This framework is applied to modeling of neuronal circuits associated with reward. [sent-4, score-0.182]
4 The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. [sent-5, score-0.374]
5 Such interactivity models which are derived from fMRI data are then validated through a group classification task. [sent-6, score-0.417]
6 Experimental results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. [sent-9, score-0.458]
7 We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. [sent-10, score-0.509]
8 This finding provides an independent test for the effect of psychopathology on brain function. [sent-12, score-0.368]
9 In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function. [sent-13, score-0.522]
10 Introduction Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active human brain [1] by providing sequences of 3D brain images with intensities representing blood oxygenation level dependent (BOLD) regional activations. [sent-15, score-0.67]
11 This has revealed exciting insights into the spatial and temporal changes underlying a broad range of brain functions, such as how we see, feel, move, understand each other and lay down memories. [sent-16, score-0.295]
12 This fMRI technology offers further promise by imaging the dynamic aspects of the functioning human brain. [sent-17, score-0.209]
13 Indeed, fMRI has encouraged a growing interest in revealing brain connectivity and interactivity within the neuroscience community. [sent-18, score-0.628]
14 It is for example understood that a dynamically managed goal directed behavior requires neural control mechanisms orchestrated to select the appropriate and task-relevant responses while inhibiting irrelevant or inappropriate processes [12]. [sent-19, score-0.124]
15 To date, the analyses and interpretation of fMRI data that are most commonly employed by neuroscientists depend on the cognitive-behavioral probes that are developed to tap regional brain function. [sent-20, score-0.295]
16 Thus, brain responses are a-priori labeled based on the putative underlying task condition and are then used to separate a priori defined groups of subjects. [sent-21, score-0.295]
17 However, in these approaches information on the connectivity and interactivity between brain voxels is discarded and brain voxels are assumed to be independent, which is an inaccurate assumption (see use of statistical maps [3][19] or the mean of each fMRI time interval[13]). [sent-23, score-1.057]
18 In this paper, we exploit Dynamic Bayesian Networks for modeling dynamic (i. [sent-24, score-0.167]
19 We suggest that through incorporation of graphical models into functional neuroimaging studies we will be able to identify neuronal patterns of connectivity and interactivity that will provide invaluable insights into basic emotional and cognitive neuroscience constructs. [sent-27, score-0.617]
20 We further propose that this interscientific incorporation may provide a valid tool where objective brain imaging data are used for the clinical purpose of diagnosis of psychopathology. [sent-28, score-0.474]
21 Specifically, in our case study we will model neuronal circuits associated with reward processing in drug addiction. [sent-29, score-0.383]
22 We have previously shown loss of sensitivity to the relative value of money in cocaine users [9]. [sent-30, score-0.306]
23 It has also been previously highlighted that the complex mechanism of drug addiction requires the connectivity and interactivity between regions comprising the mesocorticolimbic circuit [12][8]. [sent-31, score-0.783]
24 However, although advancements have been made in studying this circuit’s role in inhibitory control and reward processing, inference about the connectivity and interactivity of these regions is at best indirect. [sent-32, score-0.563]
25 Compared with dynamic causal models, DBNs admit a class of nonlinear continuous-time interactions among the hidden states and model both causal relationships between brain regions and temporal correlations among multiple processes, useful for both classification and prediction purposes. [sent-34, score-0.88]
26 Probabilistic graphical models [14][11] are graphs in which nodes represent random variables, and the (lack of) arcs represent conditional independence assumptions. [sent-35, score-0.275]
27 In our case, interconnected brain regions can be considered as nodes of a probabilistic graphical model and interactivity relationships between regions are modeled by probability values on the arcs (or the lack of) between these nodes. [sent-36, score-1.022]
28 However, the major challenge in such a machine learning approach is the choice of a particular structure that models connectivity and interactivity between brain regions in an accurate and efficient manner. [sent-37, score-0.814]
29 More specifically, instead of modeling each brain region in isolation, we aim to model the interactive pattern of multiple brain regions. [sent-39, score-0.713]
30 Furthermore, the revealed functional information is validated through a group classification case study: separating drug addicted subjects from healthy non-drug-using controls based on trained Dynamic Bayesian Networks. [sent-40, score-0.613]
31 Both conventional BBNs and HMMs are unsuitable for modeling activities underpinned not only by causal but also by clear temporal correlations among multiple processes [10], and Dynamic Bayesian Networks [5][7] are required. [sent-41, score-0.202]
32 Since the state of each brain region is not known (only observations of activation exist), it can be thought of as a hidden variable[15]. [sent-42, score-0.485]
33 [17] proposed Parallel Hidden Markov Models (PaHMMs) that factorize state space into multiple independent temporal processes without causal connections in-between. [sent-45, score-0.116]
34 [10] developed a Dynamically Multi-Linked Hidden Markov Model (DMLHMM) for the recognition of group activities involving multiple different object events in a noisy outdoor scene. [sent-49, score-0.193]
35 This model is the only one of those models that learns both the structure and parameters of the graphical model, instead of presuming a structure (possibly inaccurate) given the lack of knowledge of human brain connectivity. [sent-50, score-0.477]
36 In order to model the dynamic neuronal circuits underlying reward processing in the human brains, we explore and compare the above DBNs. [sent-51, score-0.378]
37 We propose and compare two learning schemes of DML-HMMs, one is greedy structure search (Hill-Climbing) and the other is Structural Expectation-Maximization (SEM). [sent-52, score-0.142]
38 To our knowledge, this is the first time that Dynamic Bayesian Networks are exploited in modeling the connectivity and interactivity among brain regions activated during a fMRI study. [sent-53, score-0.844]
39 Our current experimental classification results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. [sent-54, score-0.458]
40 We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. [sent-55, score-0.509]
41 control subjects which provides an independent test for the effects of psychopathology on brain function. [sent-56, score-0.476]
42 From the machine learning point of view, this paper provides an innovative application of Dynamic Bayesian Networks in modeling dynamic neuronal circuits. [sent-57, score-0.277]
43 Furthermore, since the structures to be explored are exclusively represented by hidden (cannot be observed directly) states and their interconnecting arcs, the structure learning of DML-HMMs poses a greater challenge than other DBNs [5]. [sent-58, score-0.441]
44 From the neuroscientific point of view, drug addiction is a complex disorder characterized by compromised inhibitory control and reward processing. [sent-59, score-0.44]
45 However, individuals with compromised mechanisms of control and reward are difficult to identify unless they are directly subjected to challenging conditions. [sent-60, score-0.167]
46 Modeling the interactive brain patterns is therefore essential since such patterns may be unique to a certain psychopathology and could hence be used for improving diagnosis and prevention efforts (e. [sent-61, score-0.513]
47 , diagnosis of drug addiction, prevention of relapse or craving). [sent-63, score-0.242]
48 In addition, the development of this framework can be applied to further our understanding of other human disorders and states such as those impacting insight and awareness, that similarly to drug addiction are currently identified based mostly on subjective criteria and self-report. [sent-64, score-0.365]
49 More specifically, the ith hidden state variable and the jth observation variable at time instance t are denoted as (i) (j) St and Ot with i ∈ {1, . [sent-70, score-0.195]
50 , No }, Nh and No are the number of hidden state variables and observation variables respectively. [sent-76, score-0.195]
51 The E step, which involves the inference of hidden states given parameters, can be implemented using an exact inference algorithm such as the junction tree algorithm. [sent-88, score-0.205]
52 In [10], the DML-HMM was selected from a set of candidate structures, however the selection of candidate structure is non-trivial for most applications including brain region connectivity. [sent-90, score-0.347]
53 For a DML-HMM with N hidden nodes, the total number of different struc2 tures is 2N −N , thus it is impossible to conduct an exhaustive search in most cases. [sent-91, score-0.161]
54 The structure learning of DML-HMMs is more challenging since the structures to be explored are exclusively represented by the hidden states and none of them can be directly observed. [sent-93, score-0.441]
55 Then we perform parameter learning for each of the neighbors and go to the neighbor with the minimum score until there is no neighbor whose score is higher than the current DBN. [sent-98, score-0.181]
56 However, as we described above, the structure of DML-HMMs are represented by the hidden states which can not be observed directly. [sent-101, score-0.257]
57 The MPE thus provides a complete estimation of the hidden states and a complete-data structural search [4] is then performed to find the best structure. [sent-103, score-0.281]
58 In this scheme, the structural search is performed in the inner loop thus making the learning more efficient. [sent-105, score-0.162]
59 To guarantee halting of the algorithm, a loop detector can be added so that, once any structure is selected in a second time, we stop the learning and select the structure with the minimum score visited during the searching. [sent-110, score-0.226]
60 Modeling Reward Neuronal Circuits: A Case Study In this section, we will describe our case study of modeling Reward Neuronal Circuits: by using DBNs, we aim to model the interactive pattern of multiple brain regions for the neuropsychological problem of sensitivity to the relative value of money. [sent-126, score-0.59]
61 Furthermore, we will examine the revealed functional information encapsulated in the trained DBNs through a group classification study: separating drug addicted subjects from healthy non-drug-using controls based on trained DBNs. [sent-127, score-0.644]
62 Data Collection and Preprocess In our experiments, data were collected to study the neuropsychological problem of loss of sensitivity to the relative value of money in cocaine users[9]. [sent-130, score-0.342]
63 They received a monetary reward if they performed correctly. [sent-133, score-0.22]
64 Specifically, three runs were repeated twice (T1, T2, T3; and T1R, T2R, T3R) and in each run, there were three monetary conditions (high money, low money, no money) and a baseline condition; the order of monetary conditions was pseudo-randomized and identical for all participants. [sent-134, score-0.254]
65 Participants were informed about the monetary condition by a 3-sec instruction slide, presenting the stimuli: $0. [sent-135, score-0.127]
66 Feedback for correct responses in each condition consisted of the respective numeral designating the amount of money the subject has earned if correct or the symbol (X) otherwise. [sent-139, score-0.142]
67 16 cocaine dependent individuals, 18-55 years of age, in good health, were matched with 12 non-drug-using controls on sex, race, education and general intellectual functioning. [sent-141, score-0.127]
68 Prior to training the DBN, we selected 5 brain regions: Left Inferior Frontal Gyrus (Left IFG), Prefrontal Cortex (PFC, including lateral and medial dorsolateral PFC and the anterior cingulate), Midbrain (including substantia nigra), Thalamus and Cerebellum. [sent-148, score-0.365]
69 These regions were selected based on prior SPM analyses random-effects analyses (ANOVA) where the goal was to differentiate effect of money (high, low, no) from the effect of group (cocaine, Figure 2: Learning processes and learned structures from two algorithms. [sent-149, score-0.606]
70 The leftmost column demonstrates two (superimposed) learned structures where light gray dashed arcs (long dash) are learned from DML-HMM-HC, dark gray dashed arcs (short dash) from DML-HMM-SEM and black solid arcs from both. [sent-150, score-0.882]
71 The right columns shows the transient structures of the learning processes of two algorithms where black represents existence of arc and white represents no arc. [sent-151, score-0.179]
72 control) on all regions that were activated to monetary reward in all subjects. [sent-152, score-0.351]
73 In all these five regions, the monetary main effect was significant as evidenced by region of interest follow-up analyses. [sent-153, score-0.127]
74 Of note is the fact that these five regions are part of the mesocorticolimbic reward circuit, previously implicated in addiction. [sent-154, score-0.228]
75 Each of the above brain regions is presented by a k-D feature vector where k is the number of brain voxels selected in this brain region (i. [sent-155, score-1.051]
76 After feature selection, a DML-HMM with 5 hidden nodes can be learned as described in Sec. [sent-158, score-0.259]
77 The causal relationships discovered among different brain regions are embodied in the topology of the DML-HMM. [sent-162, score-0.479]
78 Each of the five hidden variables has two states (activated or not) and each continuous observation variable (given by a k-D feature vector) represents the observed activation of each brain region. [sent-163, score-0.563]
79 Figure 3: Left three images shows the structures learned from the 3 subsets of Group C and the right three images shows those learned from subsets of Group S. [sent-165, score-0.293]
80 Experiments and Results We collected fMRI data of 16 drug addicted subjects and 12 control subjects, 6 runs per participant. [sent-168, score-0.399]
81 In our experiments, we used a total of 152 fMRI sequences (87 scans per sequence) with 86 sequences for the drug addicted subjects (Group S) and 66 for control subjects (Group C). [sent-170, score-0.533]
82 2 demonstrates the learning process (initialized with the FHMM) for drug addicted subjects. [sent-174, score-0.326]
83 The leftmost column shows two learned structures where red arcs are learned from DMLHMM-HC, green arcs from DML-HMM-SEM and black arcs from both. [sent-175, score-0.882]
84 Since in DML-HMMSEM, structure learning is in the inner loop, the learning process is much faster than that of DML-HMM-HC. [sent-177, score-0.122]
85 We also compared the BIC scores of the learned structures and we found DML-HMM-SEM selected better structures than DML-HMM-HC. [sent-178, score-0.284]
86 It is also very interesting to examine the structure learning processes by using different training data. [sent-179, score-0.156]
87 3 where the left three images show the structures learned from the 3 subsets of Group C and the right three images show those learned from subsets of Group S. [sent-181, score-0.293]
88 3, we found the learned structures of each group are similar. [sent-183, score-0.334]
89 The second set of experiments was to apply the trained DBNs for group classification. [sent-190, score-0.194]
90 During training, the described four DBN type were employed using the training set while during the learning of DML-HMMs, different initial structures (PaHMM, CHMM, FHMM) were used and the structure with the minimum BIC score was selected from the three learned DML-HMMs. [sent-193, score-0.332]
91 Furthermore, we proposed and compared two structural learning schemes of DML-HMMs and applied the DBNs to a group classification problem. [sent-207, score-0.329]
92 To our knowledge, this is the first time that Dynamic Bayesian Networks are exploited in modeling the connectivity and interactivity among brain voxels from fMRI data. [sent-208, score-0.78]
93 This framework of exploring functional information of fMRI data provides a novel approach of revealing brain connectivity and interactivity and provides an independent test for the effect of psychopathology on brain function. [sent-209, score-1.054]
94 Currently, DBNs use independently pre-selected brain regions, thus some other important interactivity information may have been discarded in the feature selection step. [sent-210, score-0.549]
95 Our future work will focus on developing a dynamic neuronal circuit modeling framework performing feature selection and DBN learning simultaneously. [sent-211, score-0.319]
96 Due to computational limits and for clarity purposes, we explored only 5 brain regions and thus another direction of future work is to develop a hierarchical DBN topology to comprehensively model all implicated brain regions efficiently. [sent-212, score-0.824]
97 Brain activity underlying emotional valence and arousal: A response-related fmri study. [sent-219, score-0.385]
98 A modified role for the orbitofrontal cortex in attribution of salience to monetary reward in cocaine addiction: an fmri study at 4t. [sent-270, score-0.696]
99 Training fmri classifiers to detect cognitive states across multiple human subjects. [sent-333, score-0.441]
100 Machine learning for clinical diagnosis from functional magnetic resonance imaging. [sent-341, score-0.255]
wordName wordTfidf (topN-words)
[('dbns', 0.388), ('fmri', 0.349), ('brain', 0.295), ('interactivity', 0.254), ('arcs', 0.205), ('dbn', 0.173), ('drug', 0.164), ('group', 0.163), ('hidden', 0.161), ('bi', 0.147), ('money', 0.142), ('addicted', 0.127), ('cocaine', 0.127), ('monetary', 0.127), ('structures', 0.113), ('dynamic', 0.111), ('addiction', 0.109), ('regions', 0.099), ('reward', 0.093), ('causal', 0.085), ('connectivity', 0.079), ('fhmm', 0.079), ('structural', 0.076), ('neuronal', 0.075), ('pahmm', 0.073), ('psychopathology', 0.073), ('subjects', 0.07), ('interactive', 0.067), ('voxels', 0.067), ('bayesian', 0.067), ('chmm', 0.063), ('bic', 0.063), ('learned', 0.058), ('functional', 0.058), ('modeling', 0.056), ('markov', 0.056), ('schemes', 0.055), ('dynamically', 0.055), ('arameter', 0.054), ('bs', 0.054), ('otest', 0.054), ('pfc', 0.054), ('structure', 0.052), ('loop', 0.051), ('circuits', 0.051), ('ot', 0.051), ('imaging', 0.05), ('networks', 0.049), ('human', 0.048), ('incorporation', 0.047), ('classi', 0.047), ('diagnosis', 0.046), ('states', 0.044), ('goldstein', 0.043), ('circuit', 0.042), ('coupled', 0.041), ('resonance', 0.04), ('magnetic', 0.04), ('nodes', 0.04), ('neuroimaging', 0.038), ('leftmost', 0.038), ('control', 0.038), ('marked', 0.038), ('training', 0.038), ('neighbor', 0.037), ('sensitivity', 0.037), ('compromised', 0.036), ('emotional', 0.036), ('ifg', 0.036), ('mesocorticolimbic', 0.036), ('neuropsychological', 0.036), ('odel', 0.036), ('ps', 0.036), ('spm', 0.036), ('vogler', 0.036), ('volkow', 0.036), ('clinical', 0.036), ('score', 0.036), ('explored', 0.036), ('pc', 0.036), ('learning', 0.035), ('bc', 0.035), ('observation', 0.034), ('st', 0.033), ('likelihoods', 0.032), ('sequences', 0.032), ('subsets', 0.032), ('activated', 0.032), ('cingulate', 0.032), ('prevention', 0.032), ('dorsolateral', 0.032), ('trained', 0.031), ('cation', 0.031), ('processes', 0.031), ('activities', 0.03), ('graphical', 0.03), ('activation', 0.029), ('exploited', 0.029), ('hutchinson', 0.029), ('samaras', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks
Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein
Abstract: Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity models which are derived from fMRI data are then validated through a group classification task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically MultiLinked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. control subjects. This finding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function.
2 0.16013867 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
Author: Rory Sayres, David Ress, Kalanit Grill-spector
Abstract: The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex [1]. It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data. 1
3 0.085748799 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction
Author: Guido Nolte, Andreas Ziehe, Frank Meinecke, Klaus-Robert Müller
Abstract: When trying to understand the brain, it is of fundamental importance to analyse (e.g. from EEG/MEG measurements) what parts of the cortex interact with each other in order to infer more accurate models of brain activity. Common techniques like Blind Source Separation (BSS) can estimate brain sources and single out artifacts by using the underlying assumption of source signal independence. However, physiologically interesting brain sources typically interact, so BSS will—by construction— fail to characterize them properly. Noting that there are truly interacting sources and signals that only seemingly interact due to effects of volume conduction, this work aims to contribute by distinguishing these effects. For this a new BSS technique is proposed that uses anti-symmetrized cross-correlation matrices and subsequent diagonalization. The resulting decomposition consists of the truly interacting brain sources and suppresses any spurious interaction stemming from volume conduction. Our new concept of interacting source analysis (ISA) is successfully demonstrated on MEG data. 1
4 0.085591681 145 nips-2005-On Local Rewards and Scaling Distributed Reinforcement Learning
Author: Drew Bagnell, Andrew Y. Ng
Abstract: We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n. 1
5 0.076798171 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
Author: Jian Zhang, Zoubin Ghahramani, Yiming Yang
Abstract: We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, our model enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multi-label text classification data sets show that the proposed approach is promising.
6 0.072315261 87 nips-2005-Goal-Based Imitation as Probabilistic Inference over Graphical Models
7 0.06946326 50 nips-2005-Convex Neural Networks
8 0.066440463 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
9 0.065733701 64 nips-2005-Efficient estimation of hidden state dynamics from spike trains
10 0.063270368 150 nips-2005-Optimizing spatio-temporal filters for improving Brain-Computer Interfacing
11 0.062369581 108 nips-2005-Layered Dynamic Textures
12 0.06203099 153 nips-2005-Policy-Gradient Methods for Planning
13 0.061875593 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
14 0.061424062 111 nips-2005-Learning Influence among Interacting Markov Chains
15 0.061082646 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models
16 0.059695333 135 nips-2005-Neuronal Fiber Delineation in Area of Edema from Diffusion Weighted MRI
17 0.059279852 78 nips-2005-From Weighted Classification to Policy Search
18 0.058189958 4 nips-2005-A Bayesian Spatial Scan Statistic
19 0.056934655 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
20 0.056451872 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
topicId topicWeight
[(0, 0.191), (1, -0.055), (2, 0.097), (3, 0.068), (4, -0.003), (5, 0.005), (6, 0.02), (7, -0.01), (8, -0.008), (9, 0.037), (10, -0.019), (11, -0.132), (12, 0.009), (13, -0.139), (14, 0.002), (15, 0.004), (16, 0.002), (17, -0.008), (18, 0.055), (19, 0.071), (20, -0.037), (21, 0.073), (22, 0.046), (23, -0.059), (24, -0.178), (25, -0.076), (26, -0.064), (27, 0.061), (28, -0.047), (29, -0.012), (30, 0.032), (31, 0.133), (32, -0.074), (33, -0.116), (34, 0.007), (35, 0.045), (36, 0.159), (37, -0.145), (38, -0.025), (39, 0.077), (40, 0.001), (41, 0.103), (42, -0.101), (43, 0.062), (44, 0.191), (45, 0.047), (46, -0.007), (47, -0.028), (48, -0.147), (49, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.946603 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks
Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein
Abstract: Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity models which are derived from fMRI data are then validated through a group classification task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically MultiLinked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. control subjects. This finding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function.
2 0.57445896 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
Author: Rory Sayres, David Ress, Kalanit Grill-spector
Abstract: The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex [1]. It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data. 1
3 0.53664744 4 nips-2005-A Bayesian Spatial Scan Statistic
Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper
Abstract: We propose a new Bayesian method for spatial cluster detection, the “Bayesian spatial scan statistic,” and compare this method to the standard (frequentist) scan statistic approach. We demonstrate that the Bayesian statistic has several advantages over the frequentist approach, including increased power to detect clusters and (since randomization testing is unnecessary) much faster runtime. We evaluate the Bayesian and frequentist methods on the task of prospective disease surveillance: detecting spatial clusters of disease cases resulting from emerging disease outbreaks. We demonstrate that our Bayesian methods are successful in rapidly detecting outbreaks while keeping number of false positives low. 1
4 0.4748612 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction
Author: Guido Nolte, Andreas Ziehe, Frank Meinecke, Klaus-Robert Müller
Abstract: When trying to understand the brain, it is of fundamental importance to analyse (e.g. from EEG/MEG measurements) what parts of the cortex interact with each other in order to infer more accurate models of brain activity. Common techniques like Blind Source Separation (BSS) can estimate brain sources and single out artifacts by using the underlying assumption of source signal independence. However, physiologically interesting brain sources typically interact, so BSS will—by construction— fail to characterize them properly. Noting that there are truly interacting sources and signals that only seemingly interact due to effects of volume conduction, this work aims to contribute by distinguishing these effects. For this a new BSS technique is proposed that uses anti-symmetrized cross-correlation matrices and subsequent diagonalization. The resulting decomposition consists of the truly interacting brain sources and suppresses any spurious interaction stemming from volume conduction. Our new concept of interacting source analysis (ISA) is successfully demonstrated on MEG data. 1
5 0.45146048 35 nips-2005-Bayesian model learning in human visual perception
Author: Gergő Orbán, Jozsef Fiser, Richard N. Aslin, Máté Lengyel
Abstract: Humans make optimal perceptual decisions in noisy and ambiguous conditions. Computations underlying such optimal behavior have been shown to rely on probabilistic inference according to generative models whose structure is usually taken to be known a priori. We argue that Bayesian model selection is ideal for inferring similar and even more complex model structures from experience. We find in experiments that humans learn subtle statistical properties of visual scenes in a completely unsupervised manner. We show that these findings are well captured by Bayesian model learning within a class of models that seek to explain observed variables by independent hidden causes. 1
6 0.42441463 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
7 0.41871884 108 nips-2005-Layered Dynamic Textures
8 0.39949194 68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care
9 0.3948994 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
10 0.35887772 135 nips-2005-Neuronal Fiber Delineation in Area of Edema from Diffusion Weighted MRI
11 0.34902889 203 nips-2005-Visual Encoding with Jittering Eyes
12 0.34205812 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity
13 0.33314246 195 nips-2005-Transfer learning for text classification
14 0.32783192 150 nips-2005-Optimizing spatio-temporal filters for improving Brain-Computer Interfacing
15 0.31480709 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions
16 0.31477886 59 nips-2005-Dual-Tree Fast Gauss Transforms
17 0.31195742 34 nips-2005-Bayesian Surprise Attracts Human Attention
18 0.30555925 50 nips-2005-Convex Neural Networks
19 0.3052569 26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity
20 0.30139992 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
topicId topicWeight
[(3, 0.034), (10, 0.03), (12, 0.019), (27, 0.039), (31, 0.075), (34, 0.054), (39, 0.025), (41, 0.013), (44, 0.011), (55, 0.033), (60, 0.011), (65, 0.015), (69, 0.067), (73, 0.037), (87, 0.334), (88, 0.101), (91, 0.03)]
simIndex simValue paperId paperTitle
1 0.76652968 34 nips-2005-Bayesian Surprise Attracts Human Attention
Author: Laurent Itti, Pierre F. Baldi
Abstract: The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus or event, for observers that range from single neurons to complex natural or engineered systems. We describe a formal Bayesian definition of surprise that is the only consistent formulation under minimal axiomatic assumptions. Surprise quantifies how data affects a natural or artificial observer, by measuring the difference between posterior and prior beliefs of the observer. Using this framework we measure the extent to which humans direct their gaze towards surprising items while watching television and video games. We find that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average, a figure which rises to 84% when considering only gaze targets simultaneously selected by all subjects. The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction. Life is full of surprises, ranging from a great christmas gift or a new magic trick, to wardrobe malfunctions, reckless drivers, terrorist attacks, and tsunami waves. Key to survival is our ability to rapidly attend to, identify, and learn from surprising events, to decide on present and future courses of action [1]. Yet, little theoretical and computational understanding exists of the very essence of surprise, as evidenced by the absence from our everyday vocabulary of a quantitative unit of surprise: Qualities such as the “wow factor” have remained vague and elusive to mathematical analysis. Informal correlates of surprise exist at nearly all stages of neural processing. In sensory neuroscience, it has been suggested that only the unexpected at one stage is transmitted to the next stage [2]. Hence, sensory cortex may have evolved to adapt to, to predict, and to quiet down the expected statistical regularities of the world [3, 4, 5, 6], focusing instead on events that are unpredictable or surprising. Electrophysiological evidence for this early sensory emphasis onto surprising stimuli exists from studies of adaptation in visual [7, 8, 4, 9], olfactory [10, 11], and auditory cortices [12], subcortical structures like the LGN [13], and even retinal ganglion cells [14, 15] and cochlear hair cells [16]: neural response greatly attenuates with repeated or prolonged exposure to an initially novel stimulus. Surprise and novelty are also central to learning and memory formation [1], to the point that surprise is believed to be a necessary trigger for associative learning [17, 18], as supported by mounting evidence for a role of the hippocampus as a novelty detector [19, 20, 21]. Finally, seeking novelty is a well-identified human character trait, with possible association with the dopamine D4 receptor gene [22, 23, 24]. In the Bayesian framework, we develop the only consistent theory of surprise, in terms of the difference between the posterior and prior distributions of beliefs of an observer over the available class of models or hypotheses about the world. We show that this definition derived from first principles presents key advantages over more ad-hoc formulations, typically relying on detecting outlier stimuli. Armed with this new framework, we provide direct experimental evidence that surprise best characterizes what attracts human gaze in large amounts of natural video stimuli. We here extend a recent pilot study [25], adding more comprehensive theory, large-scale human data collection, and additional analysis. 1 Theory Bayesian Definition of Surprise. We propose that surprise is a general concept, which can be derived from first principles and formalized across spatio-temporal scales, sensory modalities, and, more generally, data types and data sources. Two elements are essential for a principled definition of surprise. First, surprise can exist only in the presence of uncertainty, which can arise from intrinsic stochasticity, missing information, or limited computing resources. A world that is purely deterministic and predictable in real-time for a given observer contains no surprises. Second, surprise can only be defined in a relative, subjective, manner and is related to the expectations of the observer, be it a single synapse, neuronal circuit, organism, or computer device. The same data may carry different amount of surprise for different observers, or even for the same observer taken at different times. In probability and decision theory it can be shown that the only consistent and optimal way for modeling and reasoning about uncertainty is provided by the Bayesian theory of probability [26, 27, 28]. Furthermore, in the Bayesian framework, probabilities correspond to subjective degrees of beliefs in hypotheses or models which are updated, as data is acquired, using Bayes’ theorem as the fundamental tool for transforming prior belief distributions into posterior belief distributions. Therefore, within the same optimal framework, the only consistent definition of surprise must involve: (1) probabilistic concepts to cope with uncertainty; and (2) prior and posterior distributions to capture subjective expectations. Consistently with this Bayesian approach, the background information of an observer is captured by his/her/its prior probability distribution {P (M )}M ∈M over the hypotheses or models M in a model space M. Given this prior distribution of beliefs, the fundamental effect of a new data observation D on the observer is to change the prior distribution {P (M )}M ∈M into the posterior distribution {P (M |D)}M ∈M via Bayes theorem, whereby P (D|M ) ∀M ∈ M, P (M |D) = P (M ). (1) P (D) In this framework, the new data observation D carries no surprise if it leaves the observer beliefs unaffected, that is, if the posterior is identical to the prior; conversely, D is surprising if the posterior distribution resulting from observing D significantly differs from the prior distribution. Therefore we formally measure surprise elicited by data as some distance measure between the posterior and prior distributions. This is best done using the relative entropy or Kullback-Leibler (KL) divergence [29]. Thus, surprise is defined by the average of the log-odd ratio: P (M |D) S(D, M) = KL(P (M |D), P (M )) = P (M |D) log dM (2) P (M ) M taken with respect to the posterior distribution over the model class M. Note that KL is not symmetric but has well-known theoretical advantages, including invariance with respect to Figure 1: Computing surprise in early sensory neurons. (a) Prior data observations, tuning preferences, and top-down influences contribute to shaping a set of “prior beliefs” a neuron may have over a class of internal models or hypotheses about the world. For instance, M may be a set of Poisson processes parameterized by the rate λ, with {P (M )}M ∈M = {P (λ)}λ∈I +∗ the prior distribution R of beliefs about which Poisson models well describe the world as sensed by the neuron. New data D updates the prior into the posterior using Bayes’ theorem. Surprise quantifies the difference between the posterior and prior distributions over the model class M. The remaining panels detail how surprise differs from conventional model fitting and outlier-based novelty. (b) In standard iterative Bayesian model fitting, at every iteration N , incoming data DN is used to update the prior {P (M |D1 , D2 , ..., DN −1 )}M ∈M into the posterior {P (M |D1 , D2 , ..., DN )}M ∈M . Freezing this learning at a given iteration, one then picks the currently best model, usually using either a maximum likelihood criterion, or a maximum a posteriori one (yielding MM AP shown). (c) This best model is used for a number of tasks at the current iteration, including outlier-based novelty detection. New data is then considered novel at that instant if it has low likelihood for the best model b a (e.g., DN is more novel than DN ). This focus onto the single best model presents obvious limitations, especially in situations where other models are nearly as good (e.g., M∗ in panel (b) is entirely ignored during standard novelty computation). One palliative solution is to consider mixture models, or simply P (D), but this just amounts to shifting the problem into a different model class. (d) Surprise directly addresses this problem by simultaneously considering all models and by measuring how data changes the observer’s distribution of beliefs from {P (M |D1 , D2 , ..., DN −1 )}M ∈M to {P (M |D1 , D2 , ..., DN )}M ∈M over the entire model class M (orange shaded area). reparameterizations. A unit of surprise — a “wow” — may then be defined for a single model M as the amount of surprise corresponding to a two-fold variation between P (M |D) and P (M ), i.e., as log P (M |D)/P (M ) (with log taken in base 2), with the total number of wows experienced for all models obtained through the integration in eq. 2. Surprise and outlier detection. Outlier detection based on the likelihood P (D|M best ) of D given a single best model Mbest is at best an approximation to surprise and, in some cases, is misleading. Consider, for instance, a case where D has very small probability both for a model or hypothesis M and for a single alternative hypothesis M. Although D is a strong outlier, it carries very little information regarding whether M or M is the better model, and therefore very little surprise. Thus an outlier detection method would strongly focus attentional resources onto D, although D is a false positive, in the sense that it carries no useful information for discriminating between the two alternative hypotheses M and M. Figure 1 further illustrates this disconnect between outlier detection and surprise. 2 Human experiments To test the surprise hypothesis — that surprise attracts human attention and gaze in natural scenes — we recorded eye movements from eight na¨ve observers (three females and ı five males, ages 23-32, normal or corrected-to-normal vision). Each watched a subset from 50 videoclips totaling over 25 minutes of playtime (46,489 video frames, 640 × 480, 60.27 Hz, mean screen luminance 30 cd/m2 , room 4 cd/m2 , viewing distance 80cm, field of view 28◦ × 21◦ ). Clips comprised outdoors daytime and nighttime scenes of crowded environments, video games, and television broadcast including news, sports, and commercials. Right-eye position was tracked with a 240 Hz video-based device (ISCAN RK-464), with methods as previously [30]. Two hundred calibrated eye movement traces (10,192 saccades) were analyzed, corresponding to four distinct observers for each of the 50 clips. Figure 2 shows sample scanpaths for one videoclip. To characterize image regions selected by participants, we process videoclips through computational metrics that output a topographic dynamic master response map, assigning in real-time a response value to every input location. A good master map would highlight, more than expected by chance, locations gazed to by observers. To score each metric we hence sample, at onset of every human saccade, master map activity around the saccade’s future endpoint, and around a uniformly random endpoint (random sampling was repeated 100 times to evaluate variability). We quantify differences between histograms of master Figure 2: (a) Sample eye movement traces from four observers (squares denote saccade endpoints). (b) Our data exhibits high inter-individual overlap, shown here with the locations where one human saccade endpoint was nearby (≈ 5◦ ) one (white squares), two (cyan squares), or all three (black squares) other humans. (c) A metric where the master map was created from the three eye movement traces other than that being tested yields an upper-bound KL score, computed by comparing the histograms of metric values at human (narrow blue bars) and random (wider green bars) saccade targets. Indeed, this metric’s map was very sparse (many random saccades landing on locations with nearzero response), yet humans preferentially saccaded towards the three active hotspots corresponding to the eye positions of three other humans (many human saccades landing on locations with near-unity responses). map samples collected from human and random saccades using again the Kullback-Leibler (KL) distance: metrics which better predict human scanpaths exhibit higher distances from random as, typically, observers non-uniformly gaze towards a minority of regions with highest metric responses while avoiding a majority of regions with low metric responses. This approach presents several advantages over simpler scoring schemes [31, 32], including agnosticity to putative mechanisms for generating saccades and the fact that applying any continuous nonlinearity to master map values would not affect scoring. Experimental results. We test six computational metrics, encompassing and extending the state-of-the-art found in previous studies. The first three quantify static image properties (local intensity variance in 16 × 16 image patches [31]; local oriented edge density as measured with Gabor filters [33]; and local Shannon entropy in 16 × 16 image patches [34]). The remaining three metrics are more sensitive to dynamic events (local motion [33]; outlier-based saliency [33]; and surprise [25]). For all metrics, we find that humans are significantly attracted by image regions with higher metric responses. However, the static metrics typically respond vigorously at numerous visual locations (Figure 3), hence they are poorly specific and yield relatively low KL scores between humans and random. The metrics sensitive to motion, outliers, and surprising events, in comparison, yield sparser maps and higher KL scores. The surprise metric of interest here quantifies low-level surprise in image patches over space and time, and at this point does not account for high-level or cognitive beliefs of our human observers. Rather, it assumes a family of simple models for image patches, each processed through 72 early feature detectors sensitive to color, orientation, motion, etc., and computes surprise from shifts in the distribution of beliefs about which models better describe the patches (see [25] and [35] for details). We find that the surprise metric significantly outperforms all other computational metrics (p < 10−100 or better on t-tests for equality of KL scores), scoring nearly 20% better than the second-best metric (saliency) and 60% better than the best static metric (entropy). Surprising stimuli often substantially differ from simple feature outliers; for example, a continually blinking light on a static background elicits sustained flicker due to its locally outlier temporal dynamics but is only surprising for a moment. Similarly, a shower of randomly-colored pixels continually excites all low-level feature detectors but rapidly becomes unsurprising. Strongest attractors of human attention. Clearly, in our and previous eye-tracking experiments, in some situations potentially interesting targets were more numerous than in others. With many possible targets, different observers may orient towards different locations, making it more difficult for a single metric to accurately predict all observers. Hence we consider (Figure 4) subsets of human saccades where at least two, three, or all four observers simultaneously agreed on a gaze target. Observers could have agreed based on bottom-up factors (e.g., only one location had interesting visual appearance at that time), top-down factors (e.g., only one object was of current cognitive interest), or both (e.g., a single cognitively interesting object was present which also had distinctive appearance). Irrespectively of the cause for agreement, it indicates consolidated belief that a location was attractive. While the KL scores of all metrics improved when progressively focusing onto only those locations, dynamic metrics improved more steeply, indicating that stimuli which more reliably attracted all observers carried more motion, saliency, and surprise. Surprise remained significantly the best metric to characterize these agreed-upon attractors of human gaze (p < 10−100 or better on t-tests for equality of KL scores). Overall, surprise explained the greatest fraction of human saccades, indicating that humans are significantly attracted towards surprising locations in video displays. Over 72% of all human saccades were targeted to locations predicted to be more surprising than on average. When only considering saccades where two, three, or four observers agreed on a common gaze target, this figure rose to 76%, 80%, and 84%, respectively. Figure 3: (a) Sample video frames, with corresponding human saccades and predictions from the entropy, surprise, and human-derived metrics. Entropy maps, like intensity variance and orientation maps, exhibited many locations with high responses, hence had low specificity and were poorly discriminative. In contrast, motion, saliency, and surprise maps were much sparser and more specific, with surprise significantly more often on target. For three example frames (first column), saccades from one subject are shown (arrows) with corresponding apertures over which master map activity at the saccade endpoint was sampled (circles). (b) KL scores for these metrics indicate significantly different performance levels, and a strict ranking of variance < orientation < entropy < motion < saliency < surprise < human-derived. KL scores were computed by comparing the number of human saccades landing onto each given range of master map values (narrow blue bars) to the number of random saccades hitting the same range (wider green bars). A score of zero would indicate equality between the human and random histograms, i.e., humans did not tend to hit various master map values any differently from expected by chance, or, the master map could not predict human saccades better than random saccades. Among the six computational metrics tested in total, surprise performed best, in that surprising locations were relatively few yet reliably gazed to by humans. Figure 4: KL scores when considering only saccades where at least one (all 10,192 saccades), two (7,948 saccades), three (5,565 saccades), or all four (2,951 saccades) humans agreed on a common gaze location, for the static (a) and dynamic metrics (b). Static metrics improved substantially when progressively focusing onto saccades with stronger inter-observer agreement (average slope 0.56 ± 0.37 percent KL score units per 1,000 pruned saccades). Hence, when humans agreed on a location, they also tended to be more reliably predicted by the metrics. Furthermore, dynamic metrics improved 4.5 times more steeply (slope 2.44 ± 0.37), suggesting a stronger role of dynamic events in attracting human attention. Surprising events were significantly the strongest (t-tests for equality of KL scores between surprise and other metrics, p < 10−100 ). 3 Discussion While previous research has shown with either static scenes or dynamic synthetic stimuli that humans preferentially fixate regions of high entropy [34], contrast [31], saliency [32], flicker [36], or motion [37], our data provides direct experimental evidence that humans fixate surprising locations even more reliably. These conclusions were made possible by developing new tools to quantify what attracts human gaze over space and time in dynamic natural scenes. Surprise explained best where humans look when considering all saccades, and even more so when restricting the analysis to only those saccades for which human observers tended to agree. Surprise hence represents an inexpensive, easily computable approximation to human attentional allocation. In the absence of quantitative tools to measure surprise, most experimental and modeling work to date has adopted the approximation that novel events are surprising, and has focused on experimental scenarios which are simple enough to ensure an overlap between informal notions of novelty and surprise: for example, a stimulus is novel during testing if it has not been seen during training [9]. Our definition opens new avenues for more sophisticated experiments, where surprise elicited by different stimuli can be precisely compared and calibrated, yielding predictions at the single-unit as well as behavioral levels. The definition of surprise — as the distance between the posterior and prior distributions of beliefs over models — is entirely general and readily applicable to the analysis of auditory, olfactory, gustatory, or somatosensory data. While here we have focused on behavior rather than detailed biophysical implementation, it is worth noting that detecting surprise in neural spike trains does not require semantic understanding of the data carried by the spike trains, and thus could provide guiding signals during self-organization and development of sensory areas. At higher processing levels, top-down cues and task demands are known to combine with stimulus novelty in capturing attention and triggering learning [1, 38], ideas which may now be formalized and quantified in terms of priors, posteriors, and surprise. Surprise, indeed, inherently depends on uncertainty and on prior beliefs. Hence surprise theory can further be tested and utilized in experiments where the prior is biased, for ex- ample by top-down instructions or prior exposures to stimuli [38]. In addition, simple surprise-based behavioral measures such as the eye-tracking one used here may prove useful for early diagnostic of human conditions including autism and attention-deficit hyperactive disorder, as well as for quantitative comparison between humans and animals which may have lower or different priors, including monkeys, frogs, and flies. Beyond sensory biology, computable surprise could guide the development of data mining and compression systems (giving more bits to surprising regions of interest), to find surprising agents in crowds, surprising sentences in books or speeches, surprising sequences in genomes, surprising medical symptoms, surprising odors in airport luggage racks, surprising documents on the world-wide-web, or to design surprising advertisements. Acknowledgments: Supported by HFSP, NSF and NGA (L.I.), NIH and NSF (P.B.). We thank UCI’s Institute for Genomics and Bioinformatics and USC’s Center High Performance Computing and Communications (www.usc.edu/hpcc) for access to their computing clusters. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] Ranganath, C. & Rainer, G. Nat Rev Neurosci 4, 193–202 (2003). Rao, R. P. & Ballard, D. H. Nat Neurosci 2, 79–87 (1999). Olshausen, B. A. & Field, D. J. Nature 381, 607–609 (1996). M¨ ller, J. R., Metha, A. B., Krauskopf, J. & Lennie, P. Science 285, 1405–1408 (1999). u Dragoi, V., Sharma, J., Miller, E. K. & Sur, M. Nat Neurosci 5, 883–891 (2002). David, S. V., Vinje, W. E. & Gallant, J. L. J Neurosci 24, 6991–7006 (2004). Maffei, L., Fiorentini, A. & Bisti, S. Science 182, 1036–1038 (1973). Movshon, J. A. & Lennie, P. Nature 278, 850–852 (1979). Fecteau, J. H. & Munoz, D. P. Nat Rev Neurosci 4, 435–443 (2003). Kurahashi, T. & Menini, A. Nature 385, 725–729 (1997). Bradley, J., Bonigk, W., Yau, K. W. & Frings, S. Nat Neurosci 7, 705–710 (2004). Ulanovsky, N., Las, L. & Nelken, I. Nat Neurosci 6, 391–398 (2003). Solomon, S. G., Peirce, J. W., Dhruv, N. T. & Lennie, P. Neuron 42, 155–162 (2004). Smirnakis, S. M., Berry, M. J. & et al. Nature 386, 69–73 (1997). Brown, S. P. & Masland, R. H. Nat Neurosci 4, 44–51 (2001). Kennedy, H. J., Evans, M. G. & et al. Nat Neurosci 6, 832–836 (2003). Schultz, W. & Dickinson, A. Annu Rev Neurosci 23, 473–500 (2000). Fletcher, P. C., Anderson, J. M., Shanks, D. R. et al. Nat Neurosci 4, 1043–1048 (2001). Knight, R. Nature 383, 256–259 (1996). Stern, C. E., Corkin, S., Gonzalez, R. G. et al. Proc Natl Acad Sci U S A 93, 8660–8665 (1996). Li, S., Cullen, W. K., Anwyl, R. & Rowan, M. J. Nat Neurosci 6, 526–531 (2003). Ebstein, R. P., Novick, O., Umansky, R. et al. Nat Genet 12, 78–80 (1996). Benjamin, J., Li, L. & et al. Nat Genet 12, 81–84 (1996). Lusher, J. M., Chandler, C. & Ball, D. Mol Psychiatry 6, 497–499 (2001). Itti, L. & Baldi, P. In Proc. IEEE CVPR. San Siego, CA (2005 in press). Cox, R. T. Am. J. Phys. 14, 1–13 (1964). Savage, L. J. The foundations of statistics (Dover, New York, 1972). (First Edition in 1954). Jaynes, E. T. Probability Theory. The Logic of Science (Cambridge University Press, 2003). Kullback, S. Information Theory and Statistics (Wiley, New York:New York, 1959). Itti, L. Visual Cognition (2005 in press). Reinagel, P. & Zador, A. M. Network 10, 341–350 (1999). Parkhurst, D., Law, K. & Niebur, E. Vision Res 42, 107–123 (2002). Itti, L. & Koch, C. Nat Rev Neurosci 2, 194–203 (2001). Privitera, C. M. & Stark, L. W. IEEE Trans Patt Anal Mach Intell 22, 970–982 (2000). All source code for all metrics is freely available at http://iLab.usc.edu/toolkit/. Theeuwes, J. Percept Psychophys 57, 637–644 (1995). Abrams, R. A. & Christ, S. E. Psychol Sci 14, 427–432 (2003). Wolfe, J. M. & Horowitz, T. S. Nat Rev Neurosci 5, 495–501 (2004).
same-paper 2 0.74157751 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks
Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein
Abstract: Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity models which are derived from fMRI data are then validated through a group classification task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically MultiLinked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classification can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. control subjects. This finding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function.
3 0.67514861 205 nips-2005-Worst-Case Bounds for Gaussian Process Models
Author: Sham M. Kakade, Matthias W. Seeger, Dean P. Foster
Abstract: We present a competitive analysis of some non-parametric Bayesian algorithms in a worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) and provide bounds on the regret (under the log loss) for commonly used non-parametric Bayesian algorithms — including Gaussian regression and logistic regression — which show how these algorithms can perform favorably under rather general conditions. These bounds explicitly handle the infinite dimensionality of these non-parametric classes in a natural way. We also make formal connections to the minimax and minimum description length (MDL) framework. Here, we show precisely how Bayesian Gaussian regression is a minimax strategy. 1
4 0.46158773 94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex
Author: Rory Sayres, David Ress, Kalanit Grill-spector
Abstract: The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex [1]. It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data. 1
5 0.42968109 45 nips-2005-Conditional Visual Tracking in Kernel Space
Author: Cristian Sminchisescu, Atul Kanujia, Zhiguo Li, Dimitris Metaxas
Abstract: We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efÄ?Ĺš ciency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our methodology (kBME) combines kernel PCA-based non-linear dimensionality reduction (kPCA) and Conditional Bayesian Mixture of Experts (BME) in order to learn complex multivalued predictors between observations and model hidden states. This is necessary for accurate, inverse, visual perception inferences, where several probable, distant 3D solutions exist due to noise or the uncertainty of monocular perspective projection. Low-dimensional models are appropriate because many visual processes exhibit strong non-linear correlations in both the image observations and the target, hidden state variables. The learned predictors are temporally combined within a conditional graphical model in order to allow a principled propagation of uncertainty. We study several predictors and empirically show that the proposed algorithm positively compares with techniques based on regression, Kernel Dependency Estimation (KDE) or PCA alone, and gives results competitive to those of high-dimensional mixture predictors at a fraction of their computational cost. We show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences. 1 Introduction and Related Work We consider the problem of inferring 3D articulated human motion from monocular video. This research topic has applications for scene understanding including human-computer interfaces, markerless human motion capture, entertainment and surveillance. A monocular approach is relevant because in real-world settings the human body parts are rarely completely observed even when using multiple cameras. This is due to occlusions form other people or objects in the scene. A robust system has to necessarily deal with incomplete, ambiguous and uncertain measurements. Methods for 3D human motion reconstruction can be classiÄ?Ĺš ed as generative and discriminative. They both require a state representation, namely a 3D human model with kinematics (joint angles) or shape (surfaces or joint positions) and they both use a set of image features as observations for state inference. The computational goal in both cases is the conditional distribution for the model state given image observations. Generative model-based approaches [6, 16, 14, 13] have been demonstrated to Ä?Ĺš‚exibly reconstruct complex unknown human motions and to naturally handle problem constraints. However it is difÄ?Ĺš cult to construct reliable observation likelihoods due to the complexity of modeling human appearance. This varies widely due to different clothing and deformation, body proportions or lighting conditions. Besides being somewhat indirect, the generative approach further imposes strict conditional independence assumptions on the temporal observations given the states in order to ensure computational tractability. Due to these factors inference is expensive and produces highly multimodal state distributions [6, 16, 13]. Generative inference algorithms require complex annealing schedules [6, 13] or systematic non-linear search for local optima [16] in order to ensure continuing tracking. These difÄ?Ĺš culties motivate the advent of a complementary class of discriminative algorithms [10, 12, 18, 2], that approximate the state conditional directly, in order to simplify inference. However, inverse, observation-to-state multivalued mappings are difÄ?Ĺš cult to learn (see e.g. Ä?Ĺš g. 1a) and a probabilistic temporal setting is necessary. In an earlier paper [15] we introduced a probabilistic discriminative framework for human motion reconstruction. Because the method operates in the originally selected state and observation spaces that can be task generic, therefore redundant and often high-dimensional, inference is more expensive and can be less robust. To summarize, reconstructing 3D human motion in a Figure 1: (a, Left) Example of 180o ambiguity in predicting 3D human poses from silhouette image features (center). It is essential that multiple plausible solutions (e.g. F 1 and F2 ) are correctly represented and tracked over time. A single state predictor will either average the distant solutions or zig-zag between them, see also tables 1 and 2. (b, Right) A conditional chain model. The local distributions p(yt |yt−1 , zt ) or p(yt |zt ) are learned as in Ä?Ĺš g. 2. For inference, the predicted local state conditional is recursively combined with the Ä?Ĺš ltered prior c.f . (1). conditional temporal framework poses the following difÄ?Ĺš culties: (i) The mapping between temporal observations and states is multivalued (i.e. the local conditional distributions to be learned are multimodal), therefore it cannot be accurately represented using global function approximations. (ii) Human models have multivariate, high-dimensional continuous states of 50 or more human joint angles. The temporal state conditionals are multimodal which makes efÄ?Ĺš cient Kalman Ä?Ĺš ltering algorithms inapplicable. General inference methods (particle Ä?Ĺš lters, mixtures) have to be used instead, but these are expensive for high-dimensional models (e.g. when reconstructing the motion of several people that operate in a joint state space). (iii) The components of the human state and of the silhouette observation vector exhibit strong correlations, because many repetitive human activities like walking or running have low intrinsic dimensionality. It appears wasteful to work with high-dimensional states of 50+ joint angles. Even if the space were truly high-dimensional, predicting correlated state dimensions independently may still be suboptimal. In this paper we present a conditional temporal estimation algorithm that restricts visual inference to low-dimensional, kernel induced state spaces. To exploit correlations among observations and among state variables, we model the local, temporal conditional distributions using ideas from Kernel PCA [11, 19] and conditional mixture modeling [7, 5], here adapted to produce multiple probabilistic predictions. The corresponding predictor is referred to as a Conditional Bayesian Mixture of Low-dimensional Kernel-Induced Experts (kBME). By integrating it within a conditional graphical model framework (Ä?Ĺš g. 1b), we can exploit temporal constraints probabilistically. We demonstrate that this methodology is effective for reconstructing the 3D motion of multiple people in monocular video. Our contribution w.r.t. [15] is a probabilistic conditional inference framework that operates over a non-linear, kernel-induced low-dimensional state spaces, and a set of experiments (on both real and artiÄ?Ĺš cial image sequences) that show how the proposed framework positively compares with powerful predictors based on KDE, PCA, or with the high-dimensional models of [15] at a fraction of their cost. 2 Probabilistic Inference in a Kernel Induced State Space We work with conditional graphical models with a chain structure [9], as shown in Ä?Ĺš g. 1b, These have continuous temporal states yt , t = 1 . . . T , observations zt . For compactness, we denote joint states Yt = (y1 , y2 , . . . , yt ) or joint observations Zt = (z1 , . . . , zt ). Learning and inference are based on local conditionals: p(yt |zt ) and p(yt |yt−1 , zt ), with yt and zt being low-dimensional, kernel induced representations of some initial model having state xt and observation rt . We obtain zt , yt from rt , xt using kernel PCA [11, 19]. Inference is performed in a low-dimensional, non-linear, kernel induced latent state space (see Ä?Ĺš g. 1b and Ä?Ĺš g. 2 and (1)). For display or error reporting, we compute the original conditional p(x|r), or a temporally Ä?Ĺš ltered version p(xt |Rt ), Rt = (r1 , r2 , . . . , rt ), using a learned pre-image state map [3]. 2.1 Density Propagation for Continuous Conditional Chains For online Ä?Ĺš ltering, we compute the optimal distribution p(yt |Zt ) for the state yt , conditioned by observations Zt up to time t. The Ä?Ĺš ltered density can be recursively derived as: p(yt |Zt ) = p(yt |yt−1 , zt )p(yt−1 |Zt−1 ) (1) yt−1 We compute using a conditional mixture for p(yt |yt−1 , zt ) (a Bayesian mixture of experts c.f . §2.2) and the prior p(yt−1 |Zt−1 ), each having, say M components. We integrate M 2 pairwise products of Gaussians analytically. The means of the expanded posterior are clustered and the centers are used to initialize a reduced M -component Kullback-Leibler approximation that is reÄ?Ĺš ned using gradient descent [15]. The propagation rule (1) is similar to the one used for discrete state labels [9], but here we work with multivariate continuous state spaces and represent the local multimodal state conditionals using kBME (Ä?Ĺš g. 2), and not log-linear models [9] (these would require intractable normalization). This complex continuous model rules out inference based on Kalman Ä?Ĺš ltering or dynamic programming [9]. 2.2 Learning Bayesian Mixtures over Kernel Induced State Spaces (kBME) In order to model conditional mappings between low-dimensional non-linear spaces we rely on kernel dimensionality reduction and conditional mixture predictors. The authors of KDE [19] propose a powerful structured unimodal predictor. This works by decorrelating the output using kernel PCA and learning a ridge regressor between the input and each decorrelated output dimension. Our procedure is also based on kernel PCA but takes into account the structure of the studied visual problem where both inputs and outputs are likely to be low-dimensional and the mapping between them multivalued. The output variables xi are projected onto the column vectors of the principal space in order to obtain their principal coordinates y i . A z ∈ P(Fr ) O p(y|z) kP CA ĂŽĹšr (r) ⊂ Fr O / y ∈ P(Fx ) O QQQ QQQ QQQ kP CA QQQ Q( ĂŽĹšx (x) ⊂ Fx x ≈ PreImage(y) O ĂŽĹšr ĂŽĹšx r ∈ R ⊂ Rr x ∈ X ⊂ Rx p(x|r) ≈ p(x|y) Figure 2: The learned low-dimensional predictor, kBME, for computing p(x|r) â‰Ä„ p(xt |rt ), ∀t. (We similarly learn p(xt |xt−1 , rt ), with input (x, r) instead of r – here we illustrate only p(x|r) for clarity.) The input r and the output x are decorrelated using Kernel PCA to obtain z and y respectively. The kernels used for the input and output are ĂŽĹš r and ĂŽĹšx , with induced feature spaces Fr and Fx , respectively. Their principal subspaces obtained by kernel PCA are denoted by P(Fr ) and P(Fx ), respectively. A conditional Bayesian mixture of experts p(y|z) is learned using the low-dimensional representation (z, y). Using learned local conditionals of the form p(yt |zt ) or p(yt |yt−1 , zt ), temporal inference can be efÄ?Ĺš ciently performed in a low-dimensional kernel induced state space (see e.g. (1) and Ä?Ĺš g. 1b). For visualization and error measurement, the Ä?Ĺš ltered density, e.g. p(yt |Zt ), can be mapped back to p(xt |Rt ) using the pre-image c.f . (3). similar procedure is performed on the inputs ri to obtain zi . In order to relate the reduced feature spaces of z and y (P(Fr ) and P(Fx )), we estimate a probability distribution over mappings from training pairs (zi , yi ). We use a conditional Bayesian mixture of experts (BME) [7, 5] in order to account for ambiguity when mapping similar, possibly identical reduced feature inputs to very different feature outputs, as common in our problem (Ä?Ĺš g. 1a). This gives a model that is a conditional mixture of low-dimensional kernel-induced experts (kBME): M g(z|δ j )N (y|Wj z, ĂŽĹ j ) p(y|z) = (2) j=1 where g(z|δ j ) is a softmax function parameterized by δ j and (Wj , ĂŽĹ j ) are the parameters and the output covariance of expert j, here a linear regressor. As in many Bayesian settings [17, 5], the weights of the experts and of the gates, Wj and δ j , are controlled by hierarchical priors, typically Gaussians with 0 mean, and having inverse variance hyperparameters controlled by a second level of Gamma distributions. We learn this model using a double-loop EM and employ ML-II type approximations [8, 17] with greedy (weight) subset selection [17, 15]. Finally, the kBME algorithm requires the computation of pre-images in order to recover the state distribution x from it’s image y ∈ P(Fx ). This is a closed form computation for polynomial kernels of odd degree. For more general kernels optimization or learning (regression based) methods are necessary [3]. Following [3, 19], we use a sparse Bayesian kernel regressor to learn the pre-image. This is based on training data (xi , yi ): p(x|y) = N (x|AĂŽĹšy (y), â„Ĺš) (3) with parameters and covariances (A, â„Ĺš). Since temporal inference is performed in the low-dimensional kernel induced state space, the pre-image function needs to be calculated only for visualizing results or for the purpose of error reporting. Propagating the result from the reduced feature space P(Fx ) to the output space X pro- duces a Gaussian mixture with M elements, having coefÄ?Ĺš cients g(z|δ j ) and components N (x|AĂŽĹšy (Wj z), AJĂŽĹšy ĂŽĹ j JĂŽĹšy A + â„Ĺš), where JĂŽĹšy is the Jacobian of the mapping ĂŽĹšy . 3 Experiments We run experiments on both real image sequences (Ä?Ĺš g. 5 and Ä?Ĺš g. 6) and on sequences where silhouettes were artiÄ?Ĺš cially rendered. The prediction error is reported in degrees (for mixture of experts, this is w.r.t. the most probable one, but see also Ä?Ĺš g. 4a), and normalized per joint angle, per frame. The models are learned using standard cross-validation. Pre-images are learned using kernel regressors and have average error 1.7o . Training Set and Model State Representation: For training we gather pairs of 3D human poses together with their image projections, here silhouettes, using the graphics package Maya. We use realistically rendered computer graphics human surface models which we animate using human motion capture [1]. Our original human representation (x) is based on articulated skeletons with spherical joints and has 56 skeletal d.o.f. including global translation. The database consists of 8000 samples of human activities including walking, running, turns, jumps, gestures in conversations, quarreling and pantomime. Image Descriptors: We work with image silhouettes obtained using statistical background subtraction (with foreground and background models). Silhouettes are informative for pose estimation although prone to ambiguities (e.g. the left / right limb assignment in side views) or occasional lack of observability of some of the d.o.f. (e.g. 180o ambiguities in the global azimuthal orientation for frontal views, e.g. Ä?Ĺš g. 1a). These are multiplied by intrinsic forward / backward monocular ambiguities [16]. As observations r, we use shape contexts extracted on the silhouette [4] (5 radial, 12 angular bins, size range 1/8 to 3 on log scale). The features are computed at different scales and sizes for points sampled on the silhouette. To work in a common coordinate system, we cluster all features in the training set into K = 50 clusters. To compute the representation of a new shape feature (a point on the silhouette), we ‘project’ onto the common basis by (inverse distance) weighted voting into the cluster centers. To obtain the representation (r) for a new silhouette we regularly sample 200 points on it and add all their feature vectors into a feature histogram. Because the representation uses overlapping features of the observation the elements of the descriptor are not independent. However, a conditional temporal framework (Ä?Ĺš g. 1b) Ä?Ĺš‚exibly accommodates this. For experiments, we use Gaussian kernels for the joint angle feature space and dot product kernels for the observation feature space. We learn state conditionals for p(yt |zt ) and p(yt |yt−1 , zt ) using 6 dimensions for the joint angle kernel induced state space and 25 dimensions for the observation induced feature space, respectively. In Ä?Ĺš g. 3b) we show an evaluation of the efÄ?Ĺš cacy of our kBME predictor for different dimensions in the joint angle kernel induced state space (the observation feature space dimension is here 50). On the analyzed dancing sequence, that involves complex motions of the arms and the legs, the non-linear model signiÄ?Ĺš cantly outperforms alternative PCA methods and gives good predictions for compact, low-dimensional models.1 In tables 1 and 2, as well as Ä?Ĺš g. 4, we perform quantitative experiments on artiÄ?Ĺš cially rendered silhouettes. 3D ground truth joint angles are available and this allows a more 1 Running times: On a Pentium 4 PC (3 GHz, 2 GB RAM), a full dimensional BME model with 5 experts takes 802s to train p(xt |xt−1 , rt ), whereas a kBME (including the pre-image) takes 95s to train p(yt |yt−1 , zt ). The prediction time is 13.7s for BME and 8.7s (including the pre-image cost 1.04s) for kBME. The integration in (1) takes 2.67s for BME and 0.31s for kBME. The speed-up for kBME is signiÄ?Ĺš cant and likely to increase with original models having higher dimensionality. Prediction Error Number of Clusters 100 1000 100 10 1 1 2 3 4 5 6 7 8 Degree of Multimodality kBME KDE_RVM PCA_BME PCA_RVM 10 1 0 20 40 Number of Dimensions 60 Figure 3: (a, Left) Analysis of ‘multimodality’ for a training set. The input zt dimension is 25, the output yt dimension is 6, both reduced using kPCA. We cluster independently in (yt−1 , zt ) and yt using many clusters (2100) to simulate small input perturbations and we histogram the yt clusters falling within each cluster in (yt−1 , zt ). This gives intuition on the degree of ambiguity in modeling p(yt |yt−1 , zt ), for small perturbations in the input. (b, Right) Evaluation of dimensionality reduction methods for an artiÄ?Ĺš cial dancing sequence (models trained on 300 samples). The kBME is our model §2.2, whereas the KDE-RVM is a KDE model learned with a Relevance Vector Machine (RVM) [17] feature space map. PCA-BME and PCA-RVM are models where the mappings between feature spaces (obtained using PCA) is learned using a BME and a RVM. The non-linearity is signiÄ?Ĺš cant. Kernel-based methods outperform PCA and give low prediction error for 5-6d models. systematic evaluation. Notice that the kernelized low-dimensional models generally outperform the PCA ones. At the same time, they give results competitive to the ones of high-dimensional BME predictors, while being lower-dimensional and therefore signiÄ?Ĺš cantly less expensive for inference, e.g. the integral in (1). In Ä?Ĺš g. 5 and Ä?Ĺš g. 6 we show human motion reconstruction results for two real image sequences. Fig. 5 shows the good quality reconstruction of a person performing an agile jump. (Given the missing observations in a side view, 3D inference for the occluded body parts would not be possible without using prior knowledge!) For this sequence we do inference using conditionals having 5 modes and reduced 6d states. We initialize tracking using p(yt |zt ), whereas for inference we use p(yt |yt−1 , zt ) within (1). In the second sequence in Ä?Ĺš g. 6, we simultaneously reconstruct the motion of two people mimicking domestic activities, namely washing a window and picking an object. Here we do inference over a product, 12-dimensional state space consisting of the joint 6d state of each person. We obtain good 3D reconstruction results, using only 5 hypotheses. Notice however, that the results are not perfect, there are small errors in the elbow and the bending of the knee for the subject at the l.h.s., and in the different wrist orientations for the subject at the r.h.s. This reÄ?Ĺš‚ects the bias of our training set. Walk and turn Conversation Run and turn left KDE-RR 10.46 7.95 5.22 RVM 4.95 4.96 5.02 KDE-RVM 7.57 6.31 6.25 BME 4.27 4.15 5.01 kBME 4.69 4.79 4.92 Table 1: Comparison of average joint angle prediction error for different models. All kPCA-based models use 6 output dimensions. Testing is done on 100 video frames for each sequence, the inputs are artiÄ?Ĺš cially generated silhouettes, not in the training set. 3D joint angle ground truth is used for evaluation. KDE-RR is a KDE model with ridge regression (RR) for the feature space mapping, KDE-RVM uses an RVM. BME uses a Bayesian mixture of experts with no dimensionality reduction. kBME is our proposed model. kPCAbased methods use kernel regressors to compute pre-images. Expert Prediction Frequency − Closest to Ground truth Frequency − Close to ground truth 30 25 20 15 10 5 0 1 2 3 4 Expert Number 14 10 8 6 4 2 0 5 1st Probable Prev Output 2nd Probable Prev Output 3rd Probable Prev Output 4th Probable Prev Output 5th Probable Prev Output 12 1 2 3 4 Current Expert 5 Figure 4: (a, Left) Histogram showing the accuracy of various expert predictors: how many times the expert ranked as the k-th most probable by the model (horizontal axis) is closest to the ground truth. The model is consistent (the most probable expert indeed is the most accurate most frequently), but occasionally less probable experts are better. (b, Right) Histograms show the dynamics of p(yt |yt−1 , zt ), i.e. how the probability mass is redistributed among experts between two successive time steps, in a conversation sequence. Walk and turn back Run and turn KDE-RR 7.59 17.7 RVM 6.9 16.8 KDE-RVM 7.15 16.08 BME 3.6 8.2 kBME 3.72 8.01 Table 2: Joint angle prediction error computed for two complex sequences with walks, runs and turns, thus more ambiguity (100 frames). Models have 6 state dimensions. Unimodal predictors average competing solutions. kBME has signiÄ?Ĺš cantly lower error. Figure 5: Reconstruction of a jump (selected frames). Top: original image sequence. Middle: extracted silhouettes. Bottom: 3D reconstruction seen from a synthetic viewpoint. 4 Conclusion We have presented a probabilistic framework for conditional inference in latent kernelinduced low-dimensional state spaces. Our approach has the following properties: (a) Figure 6: Reconstructing the activities of 2 people operating in an 12-d state space (each person has its own 6d state). Top: original image sequence. Bottom: 3D reconstruction seen from a synthetic viewpoint. Accounts for non-linear correlations among input or output variables, by using kernel nonlinear dimensionality reduction (kPCA); (b) Learns probability distributions over mappings between low-dimensional state spaces using conditional Bayesian mixture of experts, as required for accurate prediction. In the resulting low-dimensional kBME predictor ambiguities and multiple solutions common in visual, inverse perception problems are accurately represented. (c) Works in a continuous, conditional temporal probabilistic setting and offers a formal management of uncertainty. We show comparisons that demonstrate how the proposed approach outperforms regression, PCA or KDE alone for reconstructing the 3D human motion in monocular video. Future work we will investigate scaling aspects for large training sets and alternative structured prediction methods. References [1] CMU Human Motion DataBase. Online at http://mocap.cs.cmu.edu/search.html, 2003. [2] A. Agarwal and B. Triggs. 3d human pose from silhouettes by Relevance Vector Regression. In CVPR, 2004. [3] G. Bakir, J. Weston, and B. Scholkopf. Learning to Ä?Ĺš nd pre-images. In NIPS, 2004. [4] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI, 24, 2002. [5] C. Bishop and M. Svensen. Bayesian mixtures of experts. In UAI, 2003. [6] J. Deutscher, A. Blake, and I. Reid. Articulated Body Motion Capture by Annealed Particle Filtering. In CVPR, 2000. [7] M. Jordan and R. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, (6):181–214, 1994. [8] D. Mackay. Bayesian interpolation. Neural Computation, 4(5):720–736, 1992. [9] A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. In ICML, 2000. [10] R. Rosales and S. Sclaroff. Learning Body Pose Via Specialized Maps. In NIPS, 2002. [11] B. Sch¨ lkopf, A. Smola, and K. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998. [12] G. Shakhnarovich, P. Viola, and T. Darrell. Fast Pose Estimation with Parameter Sensitive Hashing. In ICCV, 2003. [13] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard. Tracking Loose-limbed People. In CVPR, 2004. [14] C. Sminchisescu and A. Jepson. Generative Modeling for Continuous Non-Linearly Embedded Visual Inference. In ICML, pages 759–766, Banff, 2004. [15] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas. Discriminative Density Propagation for 3D Human Motion Estimation. In CVPR, 2005. [16] C. Sminchisescu and B. Triggs. Kinematic Jump Processes for Monocular 3D Human Tracking. In CVPR, volume 1, pages 69–76, Madison, 2003. [17] M. Tipping. Sparse Bayesian learning and the Relevance Vector Machine. JMLR, 2001. [18] C. Tomasi, S. Petrov, and A. Sastry. 3d tracking = classiÄ?Ĺš cation + interpolation. In ICCV, 2003. [19] J. Weston, O. Chapelle, A. Elisseeff, B. Scholkopf, and V. Vapnik. Kernel dependency estimation. In NIPS, 2002.
6 0.42709371 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
7 0.42522365 30 nips-2005-Assessing Approximations for Gaussian Process Classification
8 0.42447934 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
9 0.42161006 78 nips-2005-From Weighted Classification to Policy Search
10 0.42105088 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
11 0.42098907 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
12 0.42057425 153 nips-2005-Policy-Gradient Methods for Planning
13 0.42011431 144 nips-2005-Off-policy Learning with Options and Recognizers
14 0.4197526 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories
15 0.41842932 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts
16 0.41753039 136 nips-2005-Noise and the two-thirds power Law
17 0.41588849 48 nips-2005-Context as Filtering
18 0.41464767 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
19 0.41355446 171 nips-2005-Searching for Character Models
20 0.41272745 36 nips-2005-Bayesian models of human action understanding