nips nips2005 knowledge-graph by maker-knowledge-mining

nips 2005 knowledge graph


similar papers computed by tfidf model


similar papers computed by lsi model


similar papers computed by lda model


papers list:

1 nips-2005-AER Building Blocks for Multi-Layer Multi-Chip Neuromorphic Vision Systems

Author: R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gomez-Rodriguez, H. Kolle Riis, T. Delbruck, S. C. Liu, S. Zahnd, A. M. Whatley, R. Douglas, P. Hafliger, G. Jimenez-Moreno, A. Civit, T. Serrano-Gotarredona, A. Acosta-Jimenez, B. Linares-Barranco

Abstract: A 5-layer neuromorphic vision processor whose components communicate spike events asychronously using the address-eventrepresentation (AER) is demonstrated. The system includes a retina chip, two convolution chips, a 2D winner-take-all chip, a delay line chip, a learning classifier chip, and a set of PCBs for computer interfacing and address space remappings. The components use a mixture of analog and digital computation and will learn to classify trajectories of a moving object. A complete experimental setup and measurements results are shown.

2 nips-2005-A Bayes Rule for Density Matrices

Author: Manfred K. Warmuth

Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1

3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence

Author: Odelia Schwartz, Peter Dayan, Terrence J. Sejnowski

Abstract: The misjudgement of tilt in images lies at the heart of entertaining visual illusions and rigorous perceptual psychophysics. A wealth of findings has attracted many mechanistic models, but few clear computational principles. We adopt a Bayesian approach to perceptual tilt estimation, showing how a smoothness prior offers a powerful way of addressing much confusing data. In particular, we faithfully model recent results showing that confidence in estimation can be systematically affected by the same aspects of images that affect bias. Confidence is central to Bayesian modeling approaches, and is applicable in many other perceptual domains. Perceptual anomalies and illusions, such as the misjudgements of motion and tilt evident in so many psychophysical experiments, have intrigued researchers for decades.1–3 A Bayesian view4–8 has been particularly influential in models of motion processing, treating such anomalies as the normative product of prior information (often statistically codifying Gestalt laws) with likelihood information from the actual scenes presented. Here, we expand the range of statistically normative accounts to tilt estimation, for which there are classes of results (on estimation confidence) that are so far not available for motion. The tilt illusion arises when the perceived tilt of a center target is misjudged (ie bias) in the presence of flankers. Another phenomenon, called Crowding, refers to a loss in the confidence (ie sensitivity) of perceived target tilt in the presence of flankers. Attempts have been made to formalize these phenomena quantitatively. Crowding has been modeled as compulsory feature pooling (ie averaging of orientations), ignoring spatial positions.9, 10 The tilt illusion has been explained by lateral interactions11, 12 in populations of orientationtuned units; and by calibration.13 However, most models of this form cannot explain a number of crucial aspects of the data. First, the geometry of the positional arrangement of the stimuli affects attraction versus repulsion in bias, as emphasized by Kapadia et al14 (figure 1A), and others.15, 16 Second, Solomon et al. recently measured bias and sensitivity simultaneously.11 The rich and surprising range of sensitivities, far from flat as a function of flanker angles (figure 1B), are outside the reach of standard models. Moreover, current explanations do not offer a computational account of tilt perception as the outcome of a normative inference process. Here, we demonstrate that a Bayesian framework for orientation estimation, with a prior favoring smoothness, can naturally explain a range of seemingly puzzling tilt data. We explicitly consider both the geometry of the stimuli, and the issue of confidence in the esti- 6 5 4 3 2 1 0 -1 -2 (B) Attraction Repulsion Sensititvity (1/deg) Bias (deg) (A) 0.6 0.5 0.4 0.3 0.2 0.1 -80 -60 -40 -20 0 20 40 60 80 Flanker tilt (deg) Figure 1: Tilt biases and sensitivities in visual perception. (A) Kapadia et al demonstrated the importance of geometry on tilt bias, with bar stimuli in the fovea (and similar results in the periphery). When 5 degrees clockwise flankers are arranged colinearly, the center target appears attracted in the direction of the flankers; when flankers are lateral, the target appears repulsed. Data are an average of 5 subjects.14 (B) Solomon et al measured both biases and sensitivities for gratings in the visual periphery.11 On the top are example stimuli, with flankers tilted 22.5 degrees clockwise. This constitutes the classic tilt illusion, with a repulsive bias percept. In addition, sensitivities vary as a function of flanker angles, in a systematic way (even in cases when there are no biases at all). Sensitivities are given in units of the inverse of standard deviation of the tilt estimate. More detailed data for both experiments are shown in the results section. mation. Bayesian analyses have most frequently been applied to bias. Much less attention has been paid to the equally important phenomenon of sensitivity. This aspect of our model should be applicable to other perceptual domains. In section 1 we formulate the Bayesian model. The prior is determined by the principle of creating a smooth contour between the target and flankers. We describe how to extract the bias and sensitivity. In section 2 we show experimental data of Kapadia et al and Solomon et al, alongside the model simulations, and demonstrate that the model can account for both geometry, and bias and sensitivity measurements in the data. Our results suggest a more unified, rational, approach to understanding tilt perception. 1 Bayesian model Under our Bayesian model, inference is controlled by the posterior distribution over the tilt of the target element. This comes from the combination of a prior favoring smooth configurations of the flankers and target, and the likelihood associated with the actual scene. A complete distribution would consider all possible angles and relative spatial positions of the bars, and marginalize the posterior over all but the tilt of the central element. For simplicity, we make two benign approximations: conditionalizing over (ie clamping) the angles of the flankers, and exploring only a small neighborhood of their positions. We now describe the steps of inference. Smoothness prior: Under these approximations, we consider a given actual configuration (see fig 2A) of flankers f1 = (φ1 , x1 ), f2 = (φ2 , x2 ) and center target c = (φc , xc ), arranged from top to bottom. We have to generate a prior over φc and δ1 = x1 − xc and δ2 = x2 − xc based on the principle of smoothness. As a less benign approximation, we do this in two stages: articulating a principle that determines a single optimal configuration; and generating a prior as a mixture of a Gaussian about this optimum and a uniform distribution, with the mixing proportion of the latter being determined by the smoothness of the optimum. Smoothness has been extensively studied in the computer vision literature.17–20 One widely (B) (C) f1 f1 β1 R Probability max smooth Max smooth target (deg) (A) 40 20 0 -20 c δ1 c -40 Φc f2 f2 1 0.8 0.6 0.4 0.2 0 -80 -60 -40 -20 0 20 40 Flanker tilt (deg) 60 80 -80 -60 -40 20 0 20 40 Flanker tilt (deg) 60 80 Figure 2: Geometry and smoothness for flankers, f1 and f2 , and center target, c. (A) Example actual configuration of flankers and target, aligned along the y axis from top to bottom. (B) The elastica procedure can rotate the target angle (to Φc ) and shift the relative flanker and target positions on the x axis (to δ1 and δ2 ) in its search for the maximally smooth solution. Small spatial shifts (up to 1/15 the size of R) of positions are allowed, but positional shift is overemphasized in the figure for visibility. (C) Top: center tilt that results in maximal smoothness, as a function of flanker tilt. Boxed cartoons show examples for given flanker tilts, of the optimally smooth configuration. Note attraction of target towards flankers for small flanker angles; here flankers and target are positioned in a nearly colinear arrangement. Note also repulsion of target away from flankers for intermediate flanker angles. Bottom: P [c, f1 , f2 ] for center tilt that yields maximal smoothness. The y axis is normalized between 0 and 1. used principle, elastica, known even to Euler, has been applied to contour completion21 and other computer vision applications.17 The basic idea is to find the curve with minimum energy (ie, square of curvature). Sharon et al19 showed that the elastica function can be well approximated by a number of simpler forms. We adopt a version that Leung and Malik18 adopted from Sharon et al.19 We assume that the probability for completing a smooth curve, can be factorized into two terms: P [c, f1 , f2 ] = G(c, f1 )G(c, f2 ) (1) with the term G(c, f1 ) (and similarly, G(c, f2 )) written as: R Dβ 2 2 Dβ = β1 + βc − β1 βc (2) − ) where σR σβ and β1 (and similarly, βc ) is the angle between the orientation at f1 , and the line joining f1 and c. The distance between the centers of f1 and c is given by R. The two constants, σβ and σR , control the relative contribution to smoothness of the angle versus the spatial distance. Here, we set σβ = 1, and σR = 1.5. Figure 2B illustrates an example geometry, in which φc , δ1 , and δ2 , have been shifted from the actual scene (of figure 2A). G(c, f1 ) = exp(− We now estimate the smoothest solution for given configurations. Figure 2C shows for given flanker tilts, the center tilt that yields maximal smoothness, and the corresponding probability of smoothness. For near vertical flankers, the spatial lability leads to very weak attraction and high probability of smoothness. As the flanker angle deviates farther from vertical, there is a large repulsion, but also lower probability of smoothness. These observations are key to our model: the maximally smooth center tilt will influence attractive and repulsive interactions of tilt estimation; the probability of smoothness will influence the relative weighting of the prior versus the likelihood. From the smoothness principle, we construct a two dimensional prior (figure 3A). One dimension represents tilt, the other dimension, the overall positional shift between target (B) Likelihood (D) Marginalized Posterior (C) Posterior 20 0.03 10 -10 -20 0 Probability 0 10 Angle Angle Angle 10 0 -10 -20 0.01 -10 -20 0.02 0 -0. 2 0 Position 0.2 (E) Psychometric function 20 -0. 2 0 0.2 -0. 2 0 0.2 Position Position -10 -5 0 Angle 5 10 Probability clockwise (A) Prior 20 1 0.8 0.6 0.4 0.2 0 -20 -10 0 10 20 Target angle (deg) Counter-clockwise Clockwise Figure 3: Bayes model for example flankers and target. (A) Prior 2D distribution for flankers set at 22.5 degrees (note repulsive preference for -5.5 degrees). (B) Likelihood 2D distribution for a target tilt of 3 degrees; (C) Posterior 2D distribution. All 2D distributions are drawn on the same grayscale range, and the presence of a larger baseline in the prior causes it to appear more dimmed. (D) Marginalized posterior, resulting in 1D distribution over tilt. Dashed line represents the mean, with slight preference for negative angle. (E) For this target tilt, we calculate probability clockwise, and obtain one point on psychometric curve. and flankers (called ’position’). The prior is a 2D Gaussian distribution, sat upon a constant baseline.22 The Gaussian is centered at the estimated smoothest target angle and relative position, and the baseline is determined by the probability of smoothness. The baseline, and its dependence on the flanker orientation, is a key difference from Weiss et al’s Gaussian prior for smooth, slow motion. It can be seen as a mechanism to allow segmentation (see Posterior description below). The standard deviation of the Gaussian is a free parameter. Likelihood: The likelihood over tilt and position (figure 3B) is determined by a 2D Gaussian distribution with an added baseline.22 The Gaussian is centered at the actual target tilt; and at a position taken as zero, since this is the actual position, to which the prior is compared. The standard deviation and baseline constant are free parameters. Posterior and marginalization: The posterior comes from multiplying likelihood and prior (figure 3C) and then marginalizing over position to obtain a 1D distribution over tilt. Figure 3D shows an example in which this distribution is bimodal. Other likelihoods, with closer agreement between target and smooth prior, give unimodal distributions. Note that the bimodality is a direct consequence of having an added baseline to the prior and likelihood (if these were Gaussian without a baseline, the posterior would always be Gaussian). The viewer is effectively assessing whether the target is associated with the same object as the flankers, and this is reflected in the baseline, and consequently, in the bimodality, and confidence estimate. We define α as the mean angle of the 1D posterior distribution (eg, value of dashed line on the x axis), and β as the height of the probability distribution at that mean angle (eg, height of dashed line). The term β is an indication of confidence in the angle estimate, where for larger values we are more certain of the estimate. Decision of probability clockwise: The probability of a clockwise tilt is estimated from the marginalized posterior: 1 P = 1 + exp (3) −α.∗k − log(β+η) where α and β are defined as above, k is a free parameter and η a small constant. Free parameters are set to a single constant value for all flanker and center configurations. Weiss et al use a similar compressive nonlinearity, but without the term β. We also tried a decision function that integrates the posterior, but the resulting curves were far from the sigmoidal nature of the data. Bias and sensitivity: For one target tilt, we generate a single probability and therefore a single point on the psychometric function relating tilt to the probability of choosing clockwise. We generate the full psychometric curve from all target tilts and fit to it a cumulative 60 40 20 -5 0 5 Target tilt (deg) 10 80 60 40 20 0 -10 (C) Data -5 0 5 Target tilt (deg) 10 80 60 40 20 0 -10 (D) Model 100 100 100 80 0 -10 Model Frequency responding clockwise (B) Data Frequency responding clockwise Frequency responding clockwise Frequency responding clockwise (A) 100 -5 0 5 Target tilt (deg) 10 80 60 40 20 0 -10 -5 0 5 10 Target tilt (deg) Figure 4: Kapadia et al data,14 versus Bayesian model. Solid lines are fits to a cumulative Gaussian distribution. (A) Flankers are tilted 5 degrees clockwise (black curve) or anti-clockwise (gray) of vertical, and positioned spatially in a colinear arrangement. The center bar appears tilted in the direction of the flankers (attraction), as can be seen by the attractive shift of the psychometric curve. The boxed stimuli cartoon illustrates a vertical target amidst the flankers. (B) Model for colinear bars also produces attraction. (C) Data and (D) model for lateral flankers results in repulsion. All data are collected in the fovea for bars. Gaussian distribution N (µ, σ) (figure 3E). The mean µ of the fit corresponds to the bias, 1 and σ to the sensitivity, or confidence in the bias. The fit to a cumulative Gaussian and extraction of these parameters exactly mimic psychophysical procedures.11 2 Results: data versus model We first consider the geometry of the center and flanker configurations, modeling the full psychometric curve for colinear and parallel flanks (recall that figure 1A showed summary biases). Figure 4A;B demonstrates attraction in the data and model; that is, the psychometric curve is shifted towards the flanker, because of the nature of smooth completions for colinear flankers. Figure 4C;D shows repulsion in the data and model. In this case, the flankers are arranged laterally instead of colinearly. The smoothest solution in the model arises by shifting the target estimate away from the flankers. This shift is rather minor, because the configuration has a low probability of smoothness (similar to figure 2C), and thus the prior exerts only a weak effect. The above results show examples of changes in the psychometric curve, but do not address both bias and, particularly, sensitivity, across a whole range of flanker configurations. Figure 5 depicts biases and sensitivity from Solomon et al, versus the Bayes model. The data are shown for a representative subject, but the qualitative behavior is consistent across all subjects tested. In figure 5A, bias is shown, for the condition that both flankers are tilted at the same angle. The data exhibit small attraction at near vertical flanker angles (this arrangement is close to colinear); large repulsion at intermediate flanker angles of 22.5 and 45 degrees from vertical; and minimal repulsion at large angles from vertical. This behavior is also exhibited in the Bayes model (Figure 5B). For intermediate flanker angles, the smoothest solution in the model is repulsive, and the effect of the prior is strong enough to induce a significant repulsion. For large angles, the prior exerts almost no effect. Interestingly, sensitivity is far from flat in both data and model. In the data (Figure 5C), there is most loss in sensitivity at intermediate flanker angles of 22.5 and 45 degrees (ie, the subject is less certain); and sensitivity is higher for near vertical or near horizontal flankers. The model shows the same qualitative behavior (Figure 5D). In the model, there are two factors driving sensitivity: one is the probability of completing a smooth curvature for a given flanker configuration, as in Figure 2B; this determines the strength of the prior. The other factor is certainty in a particular center estimation; this is determined by β, derived from the posterior distribution, and incorporated into the decision stage of the model Data 5 0 -60 -40 -80 -60 -40 -20 0 20 40 Flanker tilt (deg) -20 0 20 40 Flanker tilt (deg) 60 60 80 -60 -40 0.6 0.5 0.4 0.3 0.2 0.1 -20 0 20 40 Flanker tilt (deg) 60 80 60 80 -80 -60 -40 -20 0 20 40 Flanker tilt (deg) -80 -60 -40 -20 0 20 40 Flanker tilt (deg) 60 80 -20 0 20 40 Flanker tilt (deg) 60 80 (F) Bias (deg) 10 5 0 -5 0.6 0.5 0.4 0.3 0.2 0.1 -80 (D) 10 -10 -10 80 Sensitivity (1/deg) -80 5 0 -5 -80 -80 -60 -60 -40 -40 -20 0 20 40 Flanker tilt (deg) -20 0 20 40 Flanker tilt (deg) 60 -10 80 (H) 60 80 Sensitivity (1/deg) Sensititvity (1/deg) Bias (deg) 0.6 0.5 0.4 0.3 0.2 0.1 (G) Sensititvity (1/deg) 0 -5 (C) (E) 5 -5 -10 Model (B) 10 Bias (deg) Bias (deg) (A) 10 0.6 0.5 0.4 0.3 0.2 0.1 -80 -60 -40 Figure 5: Solomon et al data11 (subject FF), versus Bayesian model. (A) Data and (B) model biases with same-tilted flankers; (C) Data and (D) model sensitivities with same-tilted flankers; (E;G) data and (F;H) model as above, but for opposite-tilted flankers (note that opposite-tilted data was collected for less flanker angles). Each point in the figure is derived by fitting a cummulative Gaussian distribution N (µ, σ) to corresponding psychometric curve, and setting bias 1 equal to µ and sensitivity to σ . In all experiments, flanker and target gratings are presented in the visual periphery. Both data and model stimuli are averages of two configurations, on the left hand side (9 O’clock position) and right hand side (3 O’clock position). The configurations are similar to Figure 1 (B), but slightly shifted according to an iso-eccentric circle, so that all stimuli are similarly visible in the periphery. (equation 3). For flankers that are far from vertical, the prior has minimal effect because one cannot find a smooth solution (eg, the likelihood dominates), and thus sensitivity is higher. The low sensitivity at intermediate angles arises because the prior has considerable effect; and there is conflict between the prior (tilt, position), and likelihood (tilt, position). This leads to uncertainty in the target angle estimation . For flankers near vertical, the prior exerts a strong effect; but there is less conflict between the likelihood and prior estimates (tilt, position) for a vertical target. This leads to more confidence in the posterior estimate, and therefore, higher sensitivity. The only aspect that our model does not reproduce is the (more subtle) sensitivity difference between 0 and +/- 5 degree flankers. Figure 5E-H depict data and model for opposite tilted flankers. The bias is now close to zero in the data (Figure 5E) and model (Figure 5F), as would be expected (since the maximally smooth angle is now always roughly vertical). Perhaps more surprisingly, the sensitivities continue to to be non-flat in the data (Figure 5G) and model (Figure 5H). This behavior arises in the model due to the strength of prior, and positional uncertainty. As before, there is most loss in sensitivity at intermediate angles. Note that to fit Kapadia et al, simulations used a constant parameter of k = 9 in equation 3, whereas for the Solomon et al. simulations, k = 2.5. This indicates that, in our model, there was higher confidence in the foveal experiments than in the peripheral ones. 3 Discussion We applied a Bayesian framework to the widely studied tilt illusion, and demonstrated the model on examples from two different data sets involving foveal and peripheral estimation. Our results support the appealing hypothesis that perceptual misjudgements are not a consequence of poor system design, but rather can be described as optimal inference.4–8 Our model accounts correctly for both attraction and repulsion, determined by the smoothness prior and the geometry of the scene. We emphasized the issue of estimation confidence. The dataset showing how confidence is affected by the same issues that affect bias,11 was exactly appropriate for a Bayesian formulation; other models in the literature typically do not incorporate confidence in a thoroughly probabilistic manner. In fact, our model fits the confidence (and bias) data more proficiently than an account based on lateral interactions among a population of orientationtuned cells.11 Other Bayesian work, by Stocker et al,6 utilized the full slope of the psychometric curve in fitting a prior and likelihood to motion data, but did not examine the issue of confidence. Estimation confidence plays a central role in Bayesian formulations as a whole. Understanding how priors affect confidence should have direct bearing on many other Bayesian calculations such as multimodal integration.23 Our model is obviously over-simplified in a number of ways. First, we described it in terms of tilts and spatial positions; a more complete version should work in the pixel/filtering domain.18, 19 We have also only considered two flanking elements; the model is extendible to a full-field surround, whereby smoothness operates along a range of geometric directions, and some directions are more (smoothly) dominant than others. Second, the prior is constructed by summarizing the maximal smoothness information; a more probabilistically correct version should capture the full probability of smoothness in its prior. Third, our model does not incorporate a formal noise representation; however, sensitivities could be influenced both by stimulus-driven noise and confidence. Fourth, our model does not address attraction in the so-called indirect tilt illusion, thought to be mediated by a different mechanism. Finally, we have yet to account for neurophysiological data within this framework, and incorporate constraints at the neural implementation level. However, versions of our computations are oft suggested for intra-areal and feedback cortical circuits; and smoothness principles form a key part of the association field connection scheme in Li’s24 dynamical model of contour integration in V1. Our model is connected to a wealth of literature in computer vision and perception. Notably, occlusion and contour completion might be seen as the extreme example in which there is no likelihood information at all for the center target; a host of papers have shown that under these circumstances, smoothness principles such as elastica and variants explain many aspects of perception. The model is also associated with many studies on contour integration motivated by Gestalt principles;25, 26 and exploration of natural scene statistics and Gestalt,27, 28 including the relation to contour grouping within a Bayesian framework.29, 30 Indeed, our model could be modified to include a prior from natural scenes. There are various directions for the experimental test and refinement of our model. Most pressing is to determine bias and sensitivity for different center and flanker contrasts. As in the case of motion, our model predicts that when there is more uncertainty in the center element, prior information is more dominant. Another interesting test would be to design a task such that the center element is actually part of a different figure and unrelated to the flankers; our framework predicts that there would be minimal bias, because of segmentation. Our model should also be applied to other tilt-based illusions such as the Fraser spiral and Z¨ llner. Finally, our model can be applied to other perceptual domains;31 and given o the apparent similarities between the tilt illusion and the tilt after-effect, we plan to extend the model to adaptation, by considering smoothness in time as well as space. Acknowledgements This work was funded by the HHMI (OS, TJS) and the Gatsby Charitable Foundation (PD). We are very grateful to Serge Belongie, Leanne Chukoskie, Philip Meier and Joshua Solomon for helpful discussions. References [1] J J Gibson. Adaptation, after-effect, and contrast in the perception of tilted lines. Journal of Experimental Psychology, 20:553–569, 1937. [2] C Blakemore, R H S Carpentar, and M A Georgeson. Lateral inhibition between orientation detectors in the human visual system. Nature, 228:37–39, 1970. [3] J A Stuart and H M Burian. A study of separation difficulty: Its relationship to visual acuity in normal and amblyopic eyes. American Journal of Ophthalmology, 53:471–477, 1962. [4] A Yuille and H H Bulthoff. Perception as bayesian inference. In Knill and Whitman, editors, Bayesian decision theory and psychophysics, pages 123–161. Cambridge University Press, 1996. [5] Y Weiss, E P Simoncelli, and E H Adelson. Motion illusions as optimal percepts. Nature Neuroscience, 5:598–604, 2002. [6] A Stocker and E P Simoncelli. Constraining a bayesian model of human visual speed perception. Adv in Neural Info Processing Systems, 17, 2004. [7] D Kersten, P Mamassian, and A Yuille. Object perception as bayesian inference. Annual Review of Psychology, 55:271–304, 2004. [8] K Kording and D Wolpert. Bayesian integration in sensorimotor learning. Nature, 427:244–247, 2004. [9] L Parkes, J Lund, A Angelucci, J Solomon, and M Morgan. Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience, 4:739–744, 2001. [10] D G Pelli, M Palomares, and N J Majaj. Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4:1136–1169, 2002. [11] J Solomon, F M Felisberti, and M Morgan. Crowding and the tilt illusion: Toward a unified account. Journal of Vision, 4:500–508, 2004. [12] J A Bednar and R Miikkulainen. Tilt aftereffects in a self-organizing model of the primary visual cortex. Neural Computation, 12:1721–1740, 2000. [13] C W Clifford, P Wenderoth, and B Spehar. A functional angle on some after-effects in cortical vision. Proc Biol Sci, 1454:1705–1710, 2000. [14] M K Kapadia, G Westheimer, and C D Gilbert. Spatial distribution of contextual interactions in primary visual cortex and in visual perception. J Neurophysiology, 4:2048–262, 2000. [15] C C Chen and C W Tyler. Lateral modulation of contrast discrimination: Flanker orientation effects. Journal of Vision, 2:520–530, 2002. [16] I Mareschal, M P Sceniak, and R M Shapley. Contextual influences on orientation discrimination: binding local and global cues. Vision Research, 41:1915–1930, 2001. [17] D Mumford. Elastica and computer vision. In Chandrajit Bajaj, editor, Algebraic geometry and its applications. Springer Verlag, 1994. [18] T K Leung and J Malik. Contour continuity in region based image segmentation. In Proc. ECCV, pages 544–559, 1998. [19] E Sharon, A Brandt, and R Basri. Completion energies and scale. IEEE Pat. Anal. Mach. Intell., 22(10), 1997. [20] S W Zucker, C David, A Dobbins, and L Iverson. The organization of curve detection: coarse tangent fields. Computer Graphics and Image Processing, 9(3):213–234, 1988. [21] S Ullman. Filling in the gaps: the shape of subjective contours and a model for their generation. Biological Cybernetics, 25:1–6, 1976. [22] G E Hinton and A D Brown. Spiking boltzmann machines. Adv in Neural Info Processing Systems, 12, 1998. [23] R A Jacobs. What determines visual cue reliability? Trends in Cognitive Sciences, 6:345–350, 2002. [24] Z Li. A saliency map in primary visual cortex. Trends in Cognitive Science, 6:9–16, 2002. [25] D J Field, A Hayes, and R F Hess. Contour integration by the human visual system: evidence for a local “association field”. Vision Research, 33:173–193, 1993. [26] J Beck, A Rosenfeld, and R Ivry. Line segregation. Spatial Vision, 4:75–101, 1989. [27] M Sigman, G A Cecchi, C D Gilbert, and M O Magnasco. On a common circle: Natural scenes and gestalt rules. PNAS, 98(4):1935–1940, 2001. [28] S Mahumad, L R Williams, K K Thornber, and K Xu. Segmentation of multiple salient closed contours from real images. IEEE Pat. Anal. Mach. Intell., 25(4):433–444, 1997. [29] W S Geisler, J S Perry, B J Super, and D P Gallogly. Edge co-occurence in natural images predicts contour grouping performance. Vision Research, 6:711–724, 2001. [30] J H Elder and R M Goldberg. Ecological statistics of gestalt laws for the perceptual organization of contours. Journal of Vision, 4:324–353, 2002. [31] S R Lehky and T J Sejnowski. Neural model of stereoacuity and depth interpolation based on a distributed representation of stereo disparity. Journal of Neuroscience, 10:2281–2299, 1990.

4 nips-2005-A Bayesian Spatial Scan Statistic

Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper

Abstract: We propose a new Bayesian method for spatial cluster detection, the “Bayesian spatial scan statistic,” and compare this method to the standard (frequentist) scan statistic approach. We demonstrate that the Bayesian statistic has several advantages over the frequentist approach, including increased power to detect clusters and (since randomization testing is unnecessary) much faster runtime. We evaluate the Bayesian and frequentist methods on the task of prospective disease surveillance: detecting spatial clusters of disease cases resulting from emerging disease outbreaks. We demonstrate that our Bayesian methods are successful in rapidly detecting outbreaks while keeping number of false positives low. 1

5 nips-2005-A Computational Model of Eye Movements during Object Class Detection

Author: Wei Zhang, Hyejin Yang, Dimitris Samaras, Gregory J. Zelinsky

Abstract: We present a computational model of human eye movements in an object class detection task. The model combines state-of-the-art computer vision object class detection methods (SIFT features trained using AdaBoost) with a biologically plausible model of human eye movement to produce a sequence of simulated fixations, culminating with the acquisition of a target. We validated the model by comparing its behavior to the behavior of human observers performing the identical object class detection task (looking for a teddy bear among visually complex nontarget objects). We found considerable agreement between the model and human data in multiple eye movement measures, including number of fixations, cumulative probability of fixating the target, and scanpath distance.

6 nips-2005-A Connectionist Model for Constructive Modal Reasoning

Author: Artur Garcez, Luis C. Lamb, Dov M. Gabbay

Abstract: We present a new connectionist model for constructive, intuitionistic modal reasoning. We use ensembles of neural networks to represent intuitionistic modal theories, and show that for each intuitionistic modal program there exists a corresponding neural network ensemble that computes the program. This provides a massively parallel model for intuitionistic modal reasoning, and sets the scene for integrated reasoning, knowledge representation, and learning of intuitionistic theories in neural networks, since the networks in the ensemble can be trained by examples using standard neural learning algorithms. 1

7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects

Author: David Arathorn

Abstract: Recent neurophysiological evidence suggests the ability to interpret biological motion is facilitated by a neuronal

8 nips-2005-A Criterion for the Convergence of Learning with Spike Timing Dependent Plasticity

Author: Robert A. Legenstein, Wolfgang Maass

Abstract: We investigate under what conditions a neuron can learn by experimentally supported rules for spike timing dependent plasticity (STDP) to predict the arrival times of strong “teacher inputs” to the same neuron. It turns out that in contrast to the famous Perceptron Convergence Theorem, which predicts convergence of the perceptron learning rule for a simplified neuron model whenever a stable solution exists, no equally strong convergence guarantee can be given for spiking neurons with STDP. But we derive a criterion on the statistical dependency structure of input spike trains which characterizes exactly when learning with STDP will converge on average for a simple model of a spiking neuron. This criterion is reminiscent of the linear separability criterion of the Perceptron Convergence Theorem, but it applies here to the rows of a correlation matrix related to the spike inputs. In addition we show through computer simulations for more realistic neuron models that the resulting analytically predicted positive learning results not only hold for the common interpretation of STDP where STDP changes the weights of synapses, but also for a more realistic interpretation suggested by experimental data where STDP modulates the initial release probability of dynamic synapses. 1

9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Author: Zhenyue Zhang, Hongyuan Zha

Abstract: We propose a fast manifold learning algorithm based on the methodology of domain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. Numerical examples are given to illustrate the efficiency and effectiveness of the proposed methods. 1

10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer

Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1

11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection

Author: Long Zhu, Alan L. Yuille

Abstract: We describe a hierarchical compositional system for detecting deformable objects in images. Objects are represented by graphical models. The algorithm uses a hierarchical tree where the root of the tree corresponds to the full object and lower-level elements of the tree correspond to simpler features. The algorithm proceeds by passing simple messages up and down the tree. The method works rapidly, in under a second, on 320 × 240 images. We demonstrate the approach on detecting cats, horses, and hands. The method works in the presence of background clutter and occlusions. Our approach is contrasted with more traditional methods such as dynamic programming and belief propagation. 1

12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

Author: François Laviolette, Mario Marchand, Mohak Shah

Abstract: We design a new learning algorithm for the Set Covering Machine from a PAC-Bayes perspective and propose a PAC-Bayes risk bound which is minimized for classifiers achieving a non trivial margin-sparsity trade-off. 1

13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering

Author: Rong Jin, Feng Kang, Chris H. Ding

Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1

14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

Author: Yves Grandvalet, Johnny Mariethoz, Samy Bengio

Abstract: In this paper, we show that the hinge loss can be interpreted as the neg-log-likelihood of a semi-parametric model of posterior probabilities. From this point of view, SVMs represent the parametric component of a semi-parametric model fitted by a maximum a posteriori estimation procedure. This connection enables to derive a mapping from SVM scores to estimated posterior probabilities. Unlike previous proposals, the suggested mapping is interval-valued, providing a set of posterior probabilities compatible with each SVM score. This framework offers a new way to adapt the SVM optimization problem to unbalanced classification, when decisions result in unequal (asymmetric) losses. Experiments show improvements over state-of-the-art procedures. 1

15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

Author: Eizaburo Doi, Doru C. Balcan, Michael S. Lewicki

Abstract: Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. 1

16 nips-2005-A matching pursuit approach to sparse Gaussian process regression

Author: Sathiya Keerthi, Wei Chu

Abstract: In this paper we propose a new basis selection criterion for building sparse GP regression models that provides promising gains in accuracy as well as efficiency over previous methods. Our algorithm is much faster than that of Smola and Bartlett, while, in generalization it greatly outperforms the information gain approach proposed by Seeger et al, especially on the quality of predictive distributions. 1

17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

Abstract: We present a novel cochlear model implemented in analog very large scale integration (VLSI) technology that emulates nonlinear active cochlear behavior. This silicon cochlea includes outer hair cell (OHC) electromotility through active bidirectional coupling (ABC), a mechanism we proposed in which OHC motile forces, through the microanatomical organization of the organ of Corti, realize the cochlear amplifier. Our chip measurements demonstrate that frequency responses become larger and more sharply tuned when ABC is turned on; the degree of the enhancement decreases with input intensity as ABC includes saturation of OHC forces. 1 Silicon Cochleae Cochlear models, mathematical and physical, with the shared goal of emulating nonlinear active cochlear behavior, shed light on how the cochlea works if based on cochlear micromechanics. Among the modeling efforts, silicon cochleae have promise in meeting the need for real-time performance and low power consumption. Lyon and Mead developed the first analog electronic cochlea [1], which employed a cascade of second-order filters with exponentially decreasing resonant frequencies. However, the cascade structure suffers from delay and noise accumulation and lacks fault-tolerance. Modeling the cochlea more faithfully, Watts built a two-dimensional (2D) passive cochlea that addressed these shortcomings by incorporating the cochlear fluid using a resistive network [2]. This parallel structure, however, has its own problem: response gain is diminished by interference among the second-order sections’ outputs due to the large phase change at resonance [3]. Listening more to biology, our silicon cochlea aims to overcome the shortcomings of existing architectures by mimicking the cochlear micromechanics while including outer hair cell (OHC) electromotility. Although how exactly OHC motile forces boost the basilar membrane’s (BM) vibration remains a mystery, cochlear microanatomy provides clues. Based on these clues, we previously proposed a novel mechanism, active bidirectional coupling (ABC), for the cochlear amplifier [4]. Here, we report an analog VLSI chip that implements this mechanism. In essence, our implementation is the first silicon cochlea that employs stimulus enhancement (i.e., active behavior) instead of undamping (i.e., high filter Q [5]). The paper is organized as follows. In Section 2, we present the hypothesized mechanism (ABC), first described in [4]. In Section 3, we provide a mathematical formulation of the Oval window organ of Corti BM Round window IHC RL A OHC PhP DC BM Basal @ Stereocilia i -1 i i+1 Apical B Figure 1: The inner ear. A Cutaway showing cochlear ducts (adapted from [6]). B Longitudinal view of cochlear partition (CP) (modified from [7]-[8]). Each outer hair cell (OHC) tilts toward the base while the Deiter’s cell (DC) on which it sits extends a phalangeal process (PhP) toward the apex. The OHCs’ stereocilia and the PhPs’ apical ends form the reticular lamina (RL). d is the tilt distance, and the segment size. IHC: inner hair cell. model as the basis of cochlear circuit design. Then we proceed in Section 4 to synthesize the circuit for the cochlear chip. Last, we present chip measurements in Section 5 that demonstrate nonlinear active cochlear behavior. 2 Active Bidirectional Coupling The cochlea actively amplifies acoustic signals as it performs spectral analysis. The movement of the stapes sets the cochlear fluid into motion, which passes the stimulus energy onto a certain region of the BM, the main vibrating organ in the cochlea (Figure 1A). From the base to the apex, BM fibers increase in width and decrease in thickness, resulting in an exponential decrease in stiffness which, in turn, gives rise to the passive frequency tuning of the cochlea. The OHCs’ electromotility is widely thought to account for the cochlea’s exquisite sensitivity and discriminability. The exact way that OHC motile forces enhance the BM’s motion, however, remains unresolved. We propose that the triangular mechanical unit formed by an OHC, a phalangeal process (PhP) extended from the Deiter’s cell (DC) on which the OHC sits, and a portion of the reticular lamina (RL), between the OHC’s stereocilia end and the PhP’s apical tip, plays an active role in enhancing the BM’s responses (Figure 1B). The cochlear partition (CP) is divided into a number of segments longitudinally. Each segment includes one DC, one PhP’s apical tip and one OHC’s stereocilia end, both attached to the RL. Approximating the anatomy, we assume that when an OHC’s stereocilia end lies in segment i − 1, its basolateral end lies in the immediately apical segment i. Furthermore, the DC in segment i extends a PhP that angles toward the apex of the cochlea, with its apical end inserted just behind the stereocilia end of the OHC in segment i + 1. Our hypothesis (ABC) includes both feedforward and feedbackward interactions. On one hand, the feedforward mechanism, proposed in [9], hypothesized that the force resulting from OHC contraction or elongation is exerted onto an adjacent downstream BM segment due to the OHC’s basal tilt. On the other hand, the novel insight of the feedbackward mechanism is that the OHC force is delivered onto an adjacent upstream BM segment due to the apical tilt of the PhP extending from the DC’s main trunk. In a nutshell, the OHC motile forces, through the microanatomy of the CP, feed forward and backward, in harmony with each other, resulting in bidirectional coupling between BM segments in the longitudinal direction. Specifically, due to the opposite action of OHC S x M x Re Zm 1 0.5 0 0.2 0 A 5 10 15 20 Distance from stapes mm 25 B Figure 2: Wave propagation (WP) and basilar membrane (BM) impedance in the active cochlear model with a 2kHz pure tone (α = 0.15, γ = 0.3). A WP in fluid and BM. B BM impedance Zm (i.e., pressure divided by velocity), normalized by S(x)M (x). Only the resistive component is shown; dot marks peak location. forces on the BM and the RL, the motion of BM segment i − 1 reinforces that of segment i while the motion of segment i + 1 opposes that of segment i, as described in detail in [4]. 3 The 2D Nonlinear Active Model To provide a blueprint for the cochlear circuit design, we formulate a 2D model of the cochlea that includes ABC. Both the cochlea’s length (BM) and height (cochlear ducts) are discretized into a number of segments, with the original aspect ratio of the cochlea maintained. In the following expressions, x represents the distance from the stapes along the CP, with x = 0 at the base (or the stapes) and x = L (uncoiled cochlear duct length) at the apex; y represents the vertical distance from the BM, with y = 0 at the BM and y = ±h (cochlear duct radius) at the bottom/top wall. Providing that the assumption of fluid incompressibility holds, the velocity potential φ of the fluids is required to satisfy 2 φ(x, y, t) = 0, where 2 denotes the Laplacian operator. By definition, this potential is related to fluid velocities in the x and y directions: Vx = −∂φ/∂x and Vy = −∂φ/∂y. The BM is driven by the fluid pressure difference across it. Hence, the BM’s vertical motion (with downward displacement being positive) can be described as follows. ˙ ¨ Pd (x) + FOHC (x) = S(x)δ(x) + β(x)δ(x) + M (x)δ(x), (1) where S(x) is the stiffness, β(x) is the damping, and M (x) is the mass, per unit area, of the BM; δ is the BM’s downward displacement. Pd = ρ ∂(φSV (x, y, t) − φST (x, y, t))/∂t is the pressure difference between the two fluid ducts (the scala vestibuli (SV) and the scala tympani (ST)), evaluated at the BM (y = 0); ρ is the fluid density. The FOHC(x) term combines feedforward and feedbackward OHC forces, described by FOHC (x) = s0 tanh(αγS(x)δ(x − d)/s0 ) − tanh(αS(x)δ(x + d)/s0 ) , (2) where α denotes the OHC motility, expressed as a fraction of the BM stiffness, and γ is the ratio of feedforward to feedbackward coupling, representing relative strengths of the OHC forces exerted on the BM segment through the DC, directly and via the tilted PhP. d denotes the tilt distance, which is the horizontal displacement between the source and the recipient of the OHC force, assumed to be equal for the forward and backward cases. We use the hyperbolic tangent function to model saturation of the OHC forces, the nonlinearity that is evident in physiological measurements [8]; s0 determines the saturation level. We observed wave propagation in the model and computed the BM’s impedance (i.e., the ratio of driving pressure to velocity). Following the semi-analytical approach in [2], we simulated a linear version of the model (without saturation). The traveling wave transitions from long-wave to short-wave before the BM vibration peaks; the wavelength around the characteristic place is comparable to the tilt distance (Figure 2A). The BM impedance’s real part (i.e., the resistive component) becomes negative before the peak (Figure 2B). On the whole, inclusion of OHC motility through ABC boosts the traveling wave by pumping energy onto the BM when the wavelength matches the tilt of the OHC and PhP. 4 Analog VLSI Design and Implementation Based on our mathematical model, which produces realistic responses, we implemented a 2D nonlinear active cochlear circuit in analog VLSI, taking advantage of the 2D nature of silicon chips. We first synthesize a circuit analog of the mathematical model, and then we implement the circuit in the log-domain. We start by synthesizing a passive model, and then extend it to a nonlinear active one by including ABC with saturation. 4.1 Synthesizing the BM Circuit The model consists of two fundamental parts: the cochlear fluid and the BM. First, we design the fluid element and thus the fluid network. In discrete form, the fluids can be viewed as a grid of elements with a specific resistance that corresponds to the fluid density or mass. Since charge is conserved for a small sheet of resistance and so are particles for a small volume of fluid, we use current to simulate fluid velocity. At the transistor level, the current flowing through the channel of a MOS transistor, operating subthreshold as a diffusive element, can be used for this purpose. Therefore, following the approach in [10], we implement the cochlear fluid network using a diffusor network formed by a 2D grid of nMOS transistors. Second, we design the BM element and thus the BM. As current represents velocity, we rewrite the BM boundary condition (Equation 1, without the FOHC term): ˙ Iin = S(x) Imem dt + β(x)Imem + M (x)I˙mem , (3) where Iin , obtained by applying the voltage from the diffusor network to the gate of a pMOS transistor, represents the velocity potential scaled by the fluid density. In turn, Imem ˙ drives the diffusor network to match the fluid velocity with the BM velocity, δ. The FOHC term is dealt with in Section 4.2. Implementing this second-order system requires two state-space variables, which we name Is and Io . And with s = jω, our synthesized BM design (passive) is τ1 Is s + Is τ2 Io s + Io Imem = −Iin + Io , = Iin − bIs , = Iin + Is − Io , (4) (5) (6) where the two first-order systems are both low-pass filters (LPFs), with time constants τ1 and τ2 , respectively; b is a gain factor. Thus, Iin can be expressed in terms of Imem as: Iin s2 = (b + 1)/τ1 τ2 + ((τ1 + τ2 )/τ1 τ2)s + s2 Imem . Comparing this expression with the design target (Equation 3) yields the circuit analogs: S(x) = (b + 1)/τ1τ2 , β(x) = (τ1 + τ2 )/τ1 τ2 , and M (x) = 1. Note that the mass M (x) is a constant (i.e., 1), which was also the case in our mathematical model simulation. These analogies require that τ1 and τ2 increase exponentially to Half LPF ( ) + Iout- Iin+ Iout+ Iout Vq Iin+ Iin- C+ B Iin- A Iin+ + - - Iin- + + + C To neighbors Is- Is+ > + > + IT+ IT- + - + From neighbors Io- Io+ + + + + - - + + LPF Iout+ Iout- BM Imem+ Imem- Figure 3: Low-pass filter (LPF) and second-order section circuit design. A Half-LPF circuit. B Complete LPF circuit formed by two half-LPF circuits. C Basilar membrane (BM) circuit. It consists of two LPFs and connects to its neighbors through Is and IT . simulate the exponentially decreasing BM stiffness (and damping); b allows us to achieve a reasonable stiffness for a practical choice of τ1 and τ2 (capacitor size is limited by silicon area). 4.2 Adding Active Bidirectional Coupling To include ABC in the BM boundary condition, we replace δ in Equation 2 with Imem dt to obtain FOHC = rff S(x)T Imem (x − d)dt − rfb S(x)T Imem (x + d)dt , where rff = αγ and rfb = α denote the feedforward and feedbackward OHC motility factors, and T denotes saturation. The saturation is applied to the displacement, instead of the force, as this simplifies the implementation. We obtain the integrals by observing that, in the passive design, the state variable Is = −Imem /sτ1 . Thus, Imem (x − d)dt = −τ1f Isf and Imem (x + d)dt = −τ1b Isb . Here, Isf and Isb represent the outputs of the first LPF in the upstream and downstream BM segments, respectively; τ1f and τ1b represent their respective time constants. To reduce complexity in implementation, we use τ1 to approximate both τ1f and τ1b as the longitudinal span is small. We obtain the active BM design by replacing Equation 5 with the synthesis result: τ2 Ios + Io = Iin − bIs + rfb (b + 1)T (−Isb ) − rff (b + 1)T (−Isf ). Note that, to implement ABC, we only need to add two currents to the second LPF in the passive system. These currents, Isf and Isb , come from the upstream and downstream neighbors of each segment. ISV Fluid Base BM IST Apex Fluid A IT + IT Is+ Is- + Vsat Imem Iin+ Imem- Iin- Is+ Is+ Is- IsBM IT + IT + - I IT T Vsat IT + IT Is+ Is- B Figure 4: Cochlear chip. A Architecture: Two diffusive grids with embedded BM circuits model the cochlea. B Detail. BM circuits exchange currents with their neighbors. 4.3 Class AB Log-domain Implementation We employ the log-domain filtering technique [11] to realize current-mode operation. In addition, following the approach proposed in [12], we implement the circuit in Class AB to increase dynamic range, reduce the effect of mismatch and lower power consumption. This differential signaling is inspired by the way the biological cochlea works—the vibration of BM is driven by the pressure difference across it. Taking a bottom-up strategy, we start by designing a Class AB LPF, a building block for the BM circuit. It is described by + − + − + − + − + − 2 τ (Iout − Iout )s + (Iout − Iout ) = Iin − Iin and τ Iout Iout s + Iout Iout = Iq , where Iq sets the geometric mean of the positive and negative components of the output current, and τ sets the time constant. Combining the common-mode constraint with the differential design equation yields the nodal equation for the positive path (the negative path has superscripts + and − swapped): + − + + + − 2 ˙+ C Vout = Iτ (Iin − Iin ) + (Iq /Iout − Iout ) /(Iout + Iout ). + This nodal equation suggests the half-LPF circuit shown in Figure 3A. Vout , the voltage on + the positive capacitor (C ), gates a pMOS transistor to produce the corresponding current + − − signal, Iout (Vout and Iout are similarly related). The bias Vq sets the quiescent current Iq while Vτ determines the current Iτ , which is related to the time constant by τ = CuT/κIτ (κ is the subthreshold slope coefficient and uT is the thermal voltage). Two of these subcircuits, connected in push–pull, form a complete LPF (Figure 3B). The BM circuit is implemented using two LPFs interacting in accordance with the synthesized design equations (Figure 3C). Imem is the combination of three currents, Iin , Is , and Io . Each BM sends out Is and receives IT , a saturated version of its neighbor’s Is . The saturation is accomplished by a current-limiting transistor (see Figure 4B), which yields IT = T (Is ) = Is Isat /(Is + Isat ), where Isat is set by a bias voltage Vsat. 4.4 Chip Architecture We fabricated a version of our cochlear chip architecture (Figure 4) with 360 BM circuits and two 4680-element fluid grids (360 ×13). This chip occupies 10.9mm2 of silicon area in 0.25µm CMOS technology. Differential input signals are applied at the base while the two fluid grids are connected at the apex through a fluid element that represents the helicotrema. 5 Chip Measurements We carried out two measurements that demonstrate the desired amplification by ABC, and the compressive growth of BM responses due to saturation. To obtain sinusoidal current as the input to the BM subcircuits, we set the voltages applied at the base to be the logarithm of a half-wave rectified sinusoid. We first investigated BM-velocity frequency responses at six linearly spaced cochlear positions (Figure 5). The frequency that maximally excites the first position (Stage 30), defined as its characteristic frequency (CF), is 12.1kHz. The remaining five CFs, from early to later stages, are 8.2k, 1.7k, 905, 366, and 218Hz, respectively. Phase accumulation at the CFs ranges from 0.56 to 2.67π radians, comparable to 1.67π radians in the mammalian cochlea [13]. Q10 factor (the ratio of the CF to the bandwidth 10dB below the peak) ranges from 1.25 to 2.73, comparable to 2.55 at mid-sound intensity in biology (computed from [13]). The cutoff slope ranges from -20 to -54dB/octave, as compared to -85dB/octave in biology (computed from [13]). BM Velocity Amplitude dB 40 Stage 0 230 190 150 110 70 30 30 20 10 0 BM Velocity Phase Π radians 50 2 4 10 0.1 0.2 0.5 1 2 5 Frequency kHz A 10 20 0.1 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 5: Measured BM-velocity frequency responses at six locations. A Amplitude. B Phase. Dashed lines: Biological data (adapted from [13]). Dots mark peaks. We then explored the longitudinal pattern of BM-velocity responses and the effect of ABC. Stimulating the chip using four different pure tones, we obtained responses in which a 4kHz input elicits a peak around Stage 85 while 500Hz sound travels all the way to Stage 178 and peaks there (Figure 6A). We varied the input voltage level and obtained frequency responses at Stage 100 (Figure 6B). Input voltage level increases linearly such that the current increases exponentially; the input current level (in dB) was estimated based on the measured κ for this chip. As expected, we observed linearly increasing responses at low frequencies in the logarithmic plot. In contrast, the responses around the CF increase less and become broader with increasing input level as saturation takes effect in that region (resembling a passive cochlea). We observed 24dB compression as compared to 27 to 47dB in biology [13]. At the highest intensities, compression also occurs at low frequencies. These chip measurements demonstrate that inclusion of ABC, simply through coupling neighboring BM elements, transforms a passive cochlea into an active one. This active cochlear model’s nonlinear responses are qualitatively comparable to physiological data. 6 Conclusions We presented an analog VLSI implementation of a 2D nonlinear cochlear model that utilizes a novel active mechanism, ABC, which we proposed to account for the cochlear amplifier. ABC was shown to pump energy into the traveling wave. Rather than detecting the wave’s amplitude and implementing an automatic-gain-control loop, our biomorphic model accomplishes this simply by nonlinear interactions between adjacent neighbors. Im- 60 Frequency 4k 2k 1k 500 Hz BM Velocity Amplitude dB BM Velocity Amplitude dB 20 10 0 Input Level 40 48 dB 20 Stage 100 32 dB 16 dB 0 0 dB 10 0 50 100 150 Stage Number A 200 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 6: Measured BM-velocity responses (cont’d). A Longitudinal responses (20-stage moving average). Peak shifts to earlier (basal) stages as input frequency increases from 500 to 4kHz. B Effects of increasing input intensity. Responses become broader and show compressive growth. plemented in the log-domain, with Class AB operation, our silicon cochlea shows enhanced frequency responses, with compressive behavior around the CF, when ABC is turned on. These features are desirable in prosthetic applications and automatic speech recognition systems as they capture the properties of the biological cochlea. References [1] Lyon, R.F. & Mead, C.A. (1988) An analog electronic cochlea. IEEE Trans. Acoust. Speech and Signal Proc., 36: 1119-1134. [2] Watts, L. (1993) Cochlear Mechanics: Analysis and Analog VLSI . Ph.D. thesis, Pasadena, CA: California Institute of Technology. [3] Fragni`re, E. (2005) A 100-Channel analog CMOS auditory filter bank for speech recognition. e IEEE International Solid-State Circuits Conference (ISSCC 2005) , pp. 140-141. [4] Wen, B. & Boahen, K. (2003) A linear cochlear model with active bi-directional coupling. The 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2003), pp. 2013-2016. [5] Sarpeshkar, R., Lyon, R.F., & Mead, C.A. (1996) An analog VLSI cochlear model with new transconductance amplifier and nonlinear gain control. Proceedings of the IEEE Symposium on Circuits and Systems (ISCAS 1996) , 3: 292-295. [6] Mead, C.A. (1989) Analog VLSI and Neural Systems . Reading, MA: Addison-Wesley. [7] Russell, I.J. & Nilsen, K.E. (1997) The location of the cochlear amplifier: Spatial representation of a single tone on the guinea pig basilar membrane. Proc. Natl. Acad. Sci. USA, 94: 2660-2664. [8] Geisler, C.D. (1998) From sound to synapse: physiology of the mammalian ear . Oxford University Press. [9] Geisler, C.D. & Sang, C. (1995) A cochlear model using feed-forward outer-hair-cell forces. Hearing Research , 86: 132-146. [10] Boahen, K.A. & Andreou, A.G. (1992) A contrast sensitive silicon retina with reciprocal synapses. In Moody, J.E. and Lippmann, R.P. (eds.), Advances in Neural Information Processing Systems 4 (NIPS 1992) , pp. 764-772, Morgan Kaufmann, San Mateo, CA. [11] Frey, D.R. (1993) Log-domain filtering: an approach to current-mode filtering. IEE Proc. G, Circuits Devices Syst., 140 (6): 406-416. [12] Zaghloul, K. & Boahen, K.A. (2005) An On-Off log-domain circuit that recreates adaptive filtering in the retina. IEEE Transactions on Circuits and Systems I: Regular Papers , 52 (1): 99-107. [13] Ruggero, M.A., Rich, N.C., Narayan, S.S., & Robles, L. (1997) Basilar membrane responses to tones at the base of the chinchilla cochlea. J. Acoust. Soc. Am., 101 (4): 2151-2163.

18 nips-2005-Active Learning For Identifying Function Threshold Boundaries

Author: Brent Bryan, Robert C. Nichol, Christopher R. Genovese, Jeff Schneider, Christopher J. Miller, Larry Wasserman

Abstract: We present an efficient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassification rate, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1 − α confidence intervals for seven cosmological parameters. Experimentation shows that the algorithm reduces the computation necessary for the parameter estimation problem by an order of magnitude.

19 nips-2005-Active Learning for Misspecified Models

Author: Masashi Sugiyama

Abstract: Active learning is the problem in supervised learning to design the locations of training input points so that the generalization error is minimized. Existing active learning methods often assume that the model used for learning is correctly specified, i.e., the learning target function can be expressed by the model at hand. In many practical situations, however, this assumption may not be fulfilled. In this paper, we first show that the existing active learning method can be theoretically justified under slightly weaker condition: the model does not have to be correctly specified, but slightly misspecified models are also allowed. However, it turns out that the weakened condition is still restrictive in practice. To cope with this problem, we propose an alternative active learning method which can be theoretically justified for a wider class of misspecified models. Thus, the proposed method has a broader range of applications than the existing method. Numerical studies show that the proposed active learning method is robust against the misspecification of models and is thus reliable. 1 Introduction and Problem Formulation Let us discuss the regression problem of learning a real-valued function Ê from training examples ´Ü Ý µ ´Ü µ · ¯ Ý Ò ´Üµ defined on ½ where ¯ Ò ½ are i.i.d. noise with mean zero and unknown variance ¾. We use the following linear regression model for learning. ´Ü µ ´µ Ô ½ « ³ ´Ü µ where ³ Ü Ô ½ are fixed linearly independent functions and are parameters to be learned. ´ µ « ´«½ «¾ « Ô µ We evaluate the goodness of the learned function Ü by the expected squared test error over test input points and noise (i.e., the generalization error). When the test input points are drawn independently from a distribution with density ÔØ Ü , the generalization error is expressed as ´ µ ¯ ´Üµ   ´Üµ ¾ Ô ´Üµ Ü Ø where ¯ denotes the expectation over the noise ¯ Ò Ô ´Üµ is known1. ½. In the following, we suppose that Ø In a standard setting of regression, the training input points are provided from the environment, i.e., Ü Ò ½ independently follow the distribution with density ÔØ Ü . On the other hand, in some cases, the training input points can be designed by users. In such cases, it is expected that the accuracy of the learning result can be improved if the training input points are chosen appropriately, e.g., by densely locating training input points in the regions of high uncertainty. ´ µ Active learning—also referred to as experimental design—is the problem of optimizing the location of training input points so that the generalization error is minimized. In active learning research, it is often assumed that the regression model is correctly specified [2, 1, 3], i.e., the learning target function Ü can be expressed by the model. In practice, however, this assumption is often violated. ´ µ In this paper, we first show that the existing active learning method can still be theoretically justified when the model is approximately correct in a strong sense. Then we propose an alternative active learning method which can also be theoretically justified for approximately correct models, but the condition on the approximate correctness of the models is weaker than that for the existing method. Thus, the proposed method has a wider range of applications. In the following, we suppose that the training input points Ü Ò ½ are independently drawn from a user-defined distribution with density ÔÜ Ü , and discuss the problem of finding the optimal density function. ´µ 2 Existing Active Learning Method The generalization error defined by Eq.(1) can be decomposed as ·Î is the (squared) bias term and Î is the variance term given by where ¯ ´Üµ   ´Üµ ¾ Ô ´Üµ Ü Ø Î and ¯ ´Üµ   ¯ ´Üµ ¾ Ô ´Üµ Ü Ø A standard way to learn the parameters in the regression model (1) is the ordinary leastsquares learning, i.e., parameter vector « is determined as follows. « ÇÄË It is known that «ÇÄË is given by Ö« Ò Ñ « ÇÄË where Ä ÇÄË ´ µ ½ Ò ´Ü µ   Ý ½ Ä ÇÄË ³ ´Ü µ ¾ Ý and Ý ´Ý½ ݾ Ý Ò µ Let ÇÄË , ÇÄË and ÎÇÄË be , and Î for the learned function obtained by the ordinary least-squares learning, respectively. Then the following proposition holds. 1 In some application domains such as web page analysis or bioinformatics, a large number of unlabeled samples—input points without output values independently drawn from the distribution with density ÔØ ´Üµ—are easily gathered. In such cases, a reasonably good estimate of ÔØ ´Üµ may be obtained by some standard density estimation method. Therefore, the assumption that ÔØ ´Üµ is known may not be so restrictive. Proposition 1 ([2, 1, 3]) Suppose that the model is correctly specified, i.e., the learning target function Ü is expressed as ´µ Ô ´Ü µ Then ½ «£ ³ ´Üµ and ÎÇÄË are expressed as ÇÄË ¼ ÇÄË and Î ¾ ÇÄË Â ÇÄË where ØÖ´ÍÄ Â ÇÄË ÇÄË Ä ÇÄË µ ³ ´Üµ³ ´ÜµÔ ´Üµ Ü Í and Ø Therefore, for the correctly specified model (1), the generalization error as ÇÄË ¾ ÇÄË is expressed  ÇÄË Based on this expression, the existing active learning method determines the location of training input points Ü Ò ½ (or the training input density ÔÜ Ü ) so that ÂÇÄË is minimized [2, 1, 3]. ´ µ 3 Analysis of Existing Method under Misspecification of Models In this section, we investigate the validity of the existing active learning method for misspecified models. ´ µ Suppose the model does not exactly include the learning target function Ü , but it approximately includes it, i.e., for a scalar Æ such that Æ is small, Ü is expressed as ´ µ ´Ü µ ´Üµ · Æִܵ where ´Üµ is the orthogonal projection of ´Üµ onto the span of residual ִܵ is orthogonal to ³ ´Üµ ½ : Ô Ô ´Üµ ½ «£ ³ ´Üµ ִܵ³ ´ÜµÔ ´Üµ Ü and In this case, the bias term Ø ¼ for ³ ´Üµ ½¾ Ô and the ½ Ô is expressed as ¾ ´ ´Üµ   ´Üµµ¾ Ô ´Üµ Ü is constant which does not depend on the training input density Ô ´Üµ, we subtract ¯ ´Üµ   ´Üµ Ô ´Üµ Ü · where Ø Ø Since in the following discussion. Ü Then we have the following lemma2 . Lemma 2 For the approximately correct model (3), we have ÇÄË   ÇÄË Î ÇÄË where 2 Þ Æ ¾ ÍÄ ¾Â Ö ÇÄË Þ Ä Þ Ç ´Ò ½ µ ´Ö´Ü½µ ִܾµ Ö ÇÄË Ö Ô Ö ´Ü Proofs of lemmas are provided in an extended version [6]. Ò µµ Ç ´Æ ¾ µ Note that the asymptotic order in Eq.(1) is in probability since ÎÇÄË is a random variable that includes Ü Ò ½ . The above lemma implies that ½ Ó ´Ò  ¾ µ Therefore, the existing active learning method of minimizing  is still justified if Æ ½   ¾ µ. However, when Æ Ó ´Ò  ½ µ, the existing method may not work well because ¾ Ó ´Ò the bias term   is not smaller than the variance term Î , so it can not be ÇÄË   ¾ · Ó ´Ò ½µ  ÇÄË if Æ Ô Ô ÇÄË Ô Ô ÇÄË ÇÄË neglected. 4 New Active Learning Method In this section, we propose a new active learning method based on the weighted leastsquares learning. 4.1 Weighted Least-Squares Learning When the model is correctly specified, «ÇÄË is an unbiased estimator of «£ . However, for misspecified models, «ÇÄË is generally biased even asymptotically if Æ ÇÔ . ´½µ The bias of «ÇÄË is actually caused by the covariate shift [5]—the training input density ÔÜ Ü is different from the test input density ÔØ Ü . For correctly specified models, influence of the covariate shift can be ignored, as the existing active learning method does. However, for misspecified models, we should explicitly cope with the covariate shift. ´µ ´ µ Under the covariate shift, it is known that the following weighted least-squares learning is [5]. asymptotically unbiased even if Æ ÇÔ ´½µ Ô ´Ü µ Ô ´Ü µ ½ Ò Ö« Ò Ñ « Ï ÄË ¾ ´Ü µ   Ý Ø Ü Asymptotic unbiasedness of «Ï ÄË would be intuitively understood by the following identity, which is similar in spirit to importance sampling: ´Üµ   ´Üµ ¾ Ô ´Ü µ Ü ´Üµ   ´Üµ Ø ´µ ¾ Ô ´Üµ Ô ´Ü µ Ü Ô ´Üµ Ø Ü Ü In the following, we assume that ÔÜ Ü is strictly positive for all Ü. Let matrix with the -th diagonal element be the diagonal Ô ´Ü µ Ô ´Ü µ Ø Ü Then it can be confirmed that «Ï ÄË is given by « Ä Ï ÄË Ï ÄË Ý where Ä ´ Ï ÄË µ ½ 4.2 Active Learning Based on Weighted Least-Squares Learning Let Ï ÄË , Ï ÄË and ÎÏ ÄË be , and Î for the learned function obtained by the above weighted least-squares learning, respectively. Then we have the following lemma. Lemma 3 For the approximately correct model (3), we have   Ï ÄË Î Æ ¾ ÍÄ ¾Â Ï ÄË where Ï ÄË Ï ÄË Â Ï ÄË Þ Ä Þ Ç ´Ò ½ µ Ö Ï ÄË Ö Ô Ô ØÖ´ÍÄ Ï ÄË Ä Ï ÄË Ç ´Æ ¾ Ò ½ µ µ This lemma implies that   ¾  · Ó ´Ò ½µ ´½µ if Æ ÓÔ Based on this expression, we propose determining the training input density ÔÜ ÂÏ ÄË is minimized. Ï ÄË Ï ÄË Ô ´Üµ so that ´½µ The use of the proposed criterion ÂÏ ÄË can be theoretically justified when Æ ÓÔ , ½ while the existing criterion ÂÇÄË requires Æ ÓÔ Ò  ¾ . Therefore, the proposed method has a wider range of applications. The effect of this extension is experimentally investigated in the next section. ´ 5 µ Numerical Examples We evaluate the usefulness of the proposed active learning method through experiments. Toy Data Set: setting. We first illustrate how the proposed method works under a controlled ½ ´µ ´µ ½ · · ½¼¼ ´µ Let and the learning target function Ü be Ü   Ü Ü¾ ÆÜ¿. Let Ò ½¼¼ be i.i.d. Gaussian noise with mean zero and standard deviation and ¯ . Let ÔØ Ü ½ be the Gaussian density with mean and standard deviation , which is assumed to be known here. Let Ô and the basis functions be ³ Ü Ü  ½ for . Let us consider the following three cases. Æ , where each case corresponds to “correctly specified”, “approximately correct”, and “misspecified” (see Figure 1). We choose the training input density ÔÜ Ü from the Gaussian density with mean and standard , where deviation ¼¾ ¿ ´µ ¼ ¼ ¼¼ ¼ ¼ ¼ ½¼ ´µ ¼ ¼¿ ½¾¿ ¼¾ ¾ We compare the accuracy of the following three methods: (A) Proposed active learning criterion + WLS learning : The training input density is determined so that ÂÏ ÄË is minimized. Following the determined input density, training input points Ü ½¼¼ are created and corresponding output values Ý ½¼¼ ½ ½ are observed. Then WLS learning is used for estimating the parameters. (B) Existing active learning criterion + OLS learning [2, 1, 3]: The training input density is determined so that ÂÇÄË is minimized. OLS learning is used for estimating the parameters. (C) Passive learning + OLS learning: The test input density ÔØ Ü is used as the training input density. OLS learning is used for estimating the parameters. ´ µ First, we evaluate the accuracy of ÂÏ ÄË and ÂÇÄË as approximations of Ï ÄË and ÇÄË . The means and standard deviations of Ï ÄË , ÂÏ ÄË , ÇÄË , and ÂÇÄË over runs are (“correctly depicted as functions of in Figure 2. These graphs show that when Æ specified”), both ÂÏ ÄË and ÂÇÄË give accurate estimates of Ï ÄË and ÇÄË . When Æ (“approximately correct”), ÂÏ ÄË again works well, while ÂÇÄË tends to be negatively biased for large . This result is surprising since as illustrated in Figure 1, the learning target functions with Æ and Æ are visually quite similar. Therefore, it intuitively seems that the result of Æ is not much different from that of Æ . However, the simulation result shows that this slight difference makes ÂÇÄË unreliable. (“misspecified”), ÂÏ ÄË is still reasonably accurate, while ÂÇÄË is heavily When Æ biased. ½¼¼ ¼ ¼¼ ¼ ¼ ¼¼ ¼¼ ¼ These results show that as an approximation of the generalization error, ÂÏ ÄË is more robust against the misspecification of models than ÂÇÄË , which is in good agreement with the theoretical analyses given in Section 3 and Section 4. Learning target function f(x) 8 δ=0 δ=0.04 δ=0.5 6 Table 1: The means and standard deviations of the generalization error for Toy data set. The best method and comparable ones by the t-test at the are described with boldface. significance level The value of method (B) for Æ is extremely large but it is not a typo. 4 ± 2 0 −1.5 −1 −0.5 0 0.5 1 1.5 2 Input density functions 1.5 ¼ pt(x) Æ ¼ ½ ¦¼ ¼ px(x) 1 0.5 0 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 1: Learning target function and input density functions. ¼ Æ (A) (B) (C) ¼¼ Æ −3 −3 −3 G−WLS 12 4 3 G−WLS 5 4 ¼ x 10 6 5 ½¼¿. “misspecified” x 10 G−WLS ¼ ¦¼ ¼ ¿¼¿ ¦ ½ ¦½ ½ ¿ ¾ ¦ ½ ¾¿ ¾ ¾¦¼ ¿ “approximately correct” x 10 6 Æ All values in the table are multiplied by Æ “correctly specified” ¦¼ ¼ ¾ ¼¦¼ ½¿ ¼¼ Æ ¾ ¼¾ ¦ ¼ ¼ 3 10 8 6 0.8 1.2 1.6 2 0.07 2.4 J−WLS 0.06 0.8 1.2 1.6 2 0.07 2.4 0.8 1.2 1.6 2 0.07 J−WLS 0.06 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 2.4 J−WLS 0.06 0.8 −3 x 10 1.2 1.6 2 2.4 G−OLS 5 0.03 0.8 −3 x 10 1.2 1.6 2 3 1.2 1.6 2 1.6 2.4 2 G−OLS 0.4 4 3 0.8 0.5 G−OLS 5 4 2.4 0.3 0.2 0.1 2 2 0.8 1.2 1.6 2 0.06 2.4 J−OLS 0.8 1.2 1.6 2 0.06 2.4 0.8 1.2 0.06 J−OLS 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.02 0.02 2.4 J−OLS 0.8 1.2 1.6 c 2 2.4 0.03 0.02 0.8 Figure 2: The means and error bars of functions of . 1.2 1.6 c Ï ÄË , 2 Â Ï ÄË 2.4 , 0.8 ÇÄË 1.2 1.6 c , and ÂÇÄË over 2 2.4 ½¼¼ runs as In Table 1, the mean and standard deviation of the generalization error obtained by each method is described. When Æ , the existing method (B) works better than the proposed method (A). Actually, in this case, training input densities that approximately minimize Ï ÄË and ÇÄË were found by ÂÏ ÄË and ÂÇÄË . Therefore, the difference of the errors is caused by the difference of WLS and OLS: WLS generally has larger variance than OLS. Since bias is zero for both WLS and OLS if Æ , OLS would be more accurate than WLS. Although the proposed method (A) is outperformed by the existing method (B), it still works better than the passive learning scheme (C). When Æ and Æ the proposed method (A) gives significantly smaller errors than other methods. ¼ ¼ ¼¼ ¼ Overall, we found that for all three cases, the proposed method (A) works reasonably well and outperforms the passive learning scheme (C). On the other hand, the existing method (B) works excellently in the correctly specified case, although it tends to perform poorly once the correctness of the model is violated. Therefore, the proposed method (A) is found to be robust against the misspecification of models and thus it is reliable. Table 2: The means and standard deviations of the test error for DELVE data sets. All values in the table are multiplied by ¿. Bank-8fm Bank-8fh Bank-8nm Bank-8nh (A) ¼ ¿½ ¦ ¼ ¼ ¾ ½¼ ¦ ¼ ¼ ¾ ¦ ½ ¾¼ ¿ ¦ ½ ½½ (B) ¦ ¦ ¦ ¦ (C) ¦ ¦ ¦ ¦ ½¼ ¼ ¼¼ ¼¿ ¼¼ ¾ ¾½ ¼ ¼ ¾ ¾¼ ¼ ¼ Kin-8fm Kin-8fh ½ ¦¼ ¼ ½ ¦¼ ¼ ½ ¼¦¼ ¼ (A) (B) (C) ¾ ½ ¼ ¿ ½ ½¿ ¾ ¿ ½¿ ¿ ½¿ Kin-8nm ¼¦¼ ½ ¿ ¦ ¼ ½¿ ¾ ¦¼ ¾ Kin-8nh ¿ ¦¼ ¼ ¿ ¼¦ ¼ ¼ ¿ ¦¼ ½ ¼ ¾¦¼ ¼ ¼ ¦¼ ¼ ¼ ½¦¼ ¼ (A)/(C) (B)/(C) (C)/(C) 1.2 1.1 1 0.9 Bank−8fm Bank−8fh Bank−8nm Bank−8nh Kin−8fm Kin−8fh Kin−8nm Kin−8nh Figure 3: Mean relative performance of (A) and (B) compared with (C). For each run, the test errors of (A) and (B) are normalized by the test error of (C), and then the values are averaged over runs. Note that the error bars were reasonably small so they were omitted. ½¼¼ Realistic Data Set: Here we use eight practical data sets provided by DELVE [4]: Bank8fm, Bank-8fh, Bank-8nm, Bank-8nh, Kin-8fm, Kin-8fh, Kin-8nm, and Kin-8nh. Each data set includes samples, consisting of -dimensional input and -dimensional output values. For convenience, every attribute is normalized into . ½¾ ¼ ½℄ ½¾ ½ Suppose we are given all input points (i.e., unlabeled samples). Note that output values are unknown. From the pool of unlabeled samples, we choose Ò input points Ü ½¼¼¼ for training and observe the corresponding output values Ý ½¼¼¼. The ½ ½ task is to predict the output values of all unlabeled samples. ½¼¼¼ In this experiment, the test input density independent Gaussian density. Ô ´Üµ and ­ Ø ´¾ ­¾ ÅÄ Ô ´Üµ is unknown. Ø µ  ÜÔ    Ü   ¾ ÅÄ So we estimate it using the ¾ ´¾­¾ µ¡ ÅÄ where Å Ä are the maximum likelihood estimates of the mean and standard ÅÄ and the basis functions be deviation obtained from all unlabeled samples. Let Ô where Ø ³ ´Üµ ¼ ½   ÜÔ   Ü   Ø ¾ ¡ ¾ ¼ for ½¾ ¼ are template points randomly chosen from the pool of unlabeled samples. ´µ We select the training input density ÔÜ Ü from the independent Gaussian density with mean Å Ä and standard deviation ­Å Ä , where ¼ ¼ ¼ ¾ In this simulation, we can not create the training input points in an arbitrary location because we only have samples. Therefore, we first create temporary input points following the determined training input density, and then choose the input points from the pool of unlabeled samples that are closest to the temporary input points. For each data set, we repeat this simulation times, by changing the template points Ø ¼ ½ in each run. ½¾ ½¼¼ ½¼¼ The means and standard deviations of the test error over runs are described in Table 2. The proposed method (A) outperforms the existing method (B) for five data sets, while it is outperformed by (B) for the other three data sets. We conjecture that the model used for learning is almost correct in these three data sets. This result implies that the proposed method (A) is slightly better than the existing method (B). Figure 3 depicts the relative performance of the proposed method (A) and the existing method (B) compared with the passive learning scheme (C). This shows that (A) outperforms (C) for all eight data sets, while (B) is comparable or is outperformed by (C) for five data sets. Therefore, the proposed method (A) is overall shown to work better than other schemes. 6 Conclusions We argued that active learning is essentially the situation under the covariate shift—the training input density is different from the test input density. When the model used for learning is correctly specified, the covariate shift does not matter. However, for misspecified models, we have to explicitly cope with the covariate shift. In this paper, we proposed a new active learning method based on the weighted least-squares learning. The numerical study showed that the existing method works better than the proposed method if model is correctly specified. However, the existing method tends to perform poorly once the correctness of the model is violated. On the other hand, the proposed method overall worked reasonably well and it consistently outperformed the passive learning scheme. Therefore, the proposed method would be robust against the misspecification of models and thus it is reliable. The proposed method can be theoretically justified if the model is approximately correct in a weak sense. However, it is no longer valid for totally misspecified models. A natural future direction would be therefore to devise an active learning method which has theoretical guarantee with totally misspecified models. It is also important to notice that when the model is totally misspecified, even learning with optimal training input points would not be successful anyway. In such cases, it is of course important to carry out model selection. In active learning research—including the present paper, however, the location of training input points are designed for a single model at hand. That is, the model should have been chosen before performing active learning. Devising a method for simultaneously optimizing models and the location of training input points would be a more important and promising future direction. Acknowledgments: The author would like to thank MEXT (Grant-in-Aid for Young Scientists 17700142) for partial financial support. References [1] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996. [2] V. V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972. [3] K. Fukumizu. Statistical active learning in multilayer perceptrons. IEEE Transactions on Neural Networks, 11(1):17–26, 2000. [4] C. E. Rasmussen, R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani. The DELVE manual, 1996. [5] H. Shimodaira. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000. [6] M. Sugiyama. Active learning for misspecified models. Technical report, Department of Computer Science, Tokyo Institute of Technology, 2005.

20 nips-2005-Affine Structure From Sound

Author: Sebastian Thrun

Abstract: We consider the problem of localizing a set of microphones together with a set of external acoustic events (e.g., hand claps), emitted at unknown times and unknown locations. We propose a solution that approximates this problem under a far field approximation defined in the calculus of affine geometry, and that relies on singular value decomposition (SVD) to recover the affine structure of the problem. We then define low-dimensional optimization techniques for embedding the solution into Euclidean geometry, and further techniques for recovering the locations and emission times of the acoustic events. The approach is useful for the calibration of ad-hoc microphone arrays and sensor networks. 1

21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts

22 nips-2005-An Analog Visual Pre-Processing Processor Employing Cyclic Line Access in Only-Nearest-Neighbor-Interconnects Architecture

23 nips-2005-An Application of Markov Random Fields to Range Sensing

24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

25 nips-2005-An aVLSI Cricket Ear Model

26 nips-2005-An exploration-exploitation model based on norepinepherine and dopamine activity

27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction

30 nips-2005-Assessing Approximations for Gaussian Process Classification

31 nips-2005-Asymptotics of Gaussian Regularized Least Squares

32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

33 nips-2005-Bayesian Sets

34 nips-2005-Bayesian Surprise Attracts Human Attention

35 nips-2005-Bayesian model learning in human visual perception

36 nips-2005-Bayesian models of human action understanding

37 nips-2005-Benchmarking Non-Parametric Statistical Tests

38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks

39 nips-2005-Beyond Pair-Based STDP: a Phenomenological Rule for Spike Triplet and Frequency Effects

40 nips-2005-CMOL CrossNets: Possible Neuromorphic Nanoelectronic Circuits

41 nips-2005-Coarse sample complexity bounds for active learning

42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

45 nips-2005-Conditional Visual Tracking in Kernel Space

46 nips-2005-Consensus Propagation

47 nips-2005-Consistency of one-class SVM and related algorithms

48 nips-2005-Context as Filtering

49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

50 nips-2005-Convex Neural Networks

51 nips-2005-Correcting sample selection bias in maximum entropy density estimation

52 nips-2005-Correlated Topic Models

53 nips-2005-Cyclic Equilibria in Markov Games

54 nips-2005-Data-Driven Online to Batch Conversions

55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes

56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification

58 nips-2005-Divergences, surrogate loss functions and experimental design

59 nips-2005-Dual-Tree Fast Gauss Transforms

60 nips-2005-Dynamic Social Network Analysis using Latent Space Models

61 nips-2005-Dynamical Synapses Give Rise to a Power-Law Distribution of Neuronal Avalanches

62 nips-2005-Efficient Estimation of OOMs

63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories

64 nips-2005-Efficient estimation of hidden state dynamics from spike trains

65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting

66 nips-2005-Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization

67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care

69 nips-2005-Fast Gaussian Process Regression using KD-Trees

70 nips-2005-Fast Information Value for Graphical Models

71 nips-2005-Fast Krylov Methods for N-Body Learning

72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

73 nips-2005-Fast biped walking with a reflexive controller and real-time policy searching

74 nips-2005-Faster Rates in Regression via Active Learning

75 nips-2005-Fixing two weaknesses of the Spectral Method

76 nips-2005-From Batch to Transductive Online Learning

77 nips-2005-From Lasso regression to Feature vector machine

78 nips-2005-From Weighted Classification to Policy Search

79 nips-2005-Fusion of Similarity Data in Clustering

80 nips-2005-Gaussian Process Dynamical Models

81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers

82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

84 nips-2005-Generalization in Clustering with Unobserved Features

85 nips-2005-Generalization to Unseen Cases

86 nips-2005-Generalized Nonnegative Matrix Approximations with Bregman Divergences

87 nips-2005-Goal-Based Imitation as Probabilistic Inference over Graphical Models

88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI

89 nips-2005-Group and Topic Discovery from Relations and Their Attributes

90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

91 nips-2005-How fast to work: Response vigor, motivation and tonic dopamine

92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

93 nips-2005-Ideal Observers for Detecting Motion: Correspondence Noise

94 nips-2005-Identifying Distributed Object Representations in Human Extrastriate Visual Cortex

95 nips-2005-Improved risk tail bounds for on-line algorithms

96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits

98 nips-2005-Infinite latent feature models and the Indian buffet process

99 nips-2005-Integrate-and-Fire models with adaptation are good enough

100 nips-2005-Interpolating between types and tokens by estimating power-law generators

101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

102 nips-2005-Kernelized Infomax Clustering

103 nips-2005-Kernels for gene regulatory regions

104 nips-2005-Laplacian Score for Feature Selection

105 nips-2005-Large-Scale Multiclass Transduction

106 nips-2005-Large-scale biophysical parameter estimation in single neurons via constrained linear regression

107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

108 nips-2005-Layered Dynamic Textures

109 nips-2005-Learning Cue-Invariant Visual Responses

110 nips-2005-Learning Depth from Single Monocular Images

111 nips-2005-Learning Influence among Interacting Markov Chains

112 nips-2005-Learning Minimum Volume Sets

113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

114 nips-2005-Learning Rankings via Convex Hull Separation

115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation

116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm

117 nips-2005-Learning from Data of Variable Quality

118 nips-2005-Learning in Silicon: Timing is Everything

119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

120 nips-2005-Learning vehicular dynamics, with application to modeling helicopters

121 nips-2005-Location-based activity recognition

122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables

124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks

125 nips-2005-Message passing for task redistribution on sparse graphs

126 nips-2005-Metric Learning by Collapsing Classes

127 nips-2005-Mixture Modeling by Affinity Propagation

128 nips-2005-Modeling Memory Transfer and Saving in Cerebellar Motor Learning

129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions

130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks

131 nips-2005-Multiple Instance Boosting for Object Detection

132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

133 nips-2005-Nested sampling for Potts models

134 nips-2005-Neural mechanisms of contrast dependent receptive field size in V1

135 nips-2005-Neuronal Fiber Delineation in Area of Edema from Diffusion Weighted MRI

136 nips-2005-Noise and the two-thirds power Law

137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

138 nips-2005-Non-Local Manifold Parzen Windows

139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior

141 nips-2005-Norepinephrine and Neural Interrupts

142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

144 nips-2005-Off-policy Learning with Options and Recognizers

145 nips-2005-On Local Rewards and Scaling Distributed Reinforcement Learning

146 nips-2005-On the Accuracy of Bounded Rationality: How Far from Optimal Is Fast and Frugal?

147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis

148 nips-2005-Online Discovery and Learning of Predictive State Representations

149 nips-2005-Optimal cue selection strategy

150 nips-2005-Optimizing spatio-temporal filters for improving Brain-Computer Interfacing

151 nips-2005-Pattern Recognition from One Example by Chopping

152 nips-2005-Phase Synchrony Rate for the Recognition of Motor Imagery in Brain-Computer Interface

153 nips-2005-Policy-Gradient Methods for Planning

154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

156 nips-2005-Prediction and Change Detection

157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

158 nips-2005-Products of ``Edge-perts

159 nips-2005-Q-Clustering

160 nips-2005-Query by Committee Made Real

161 nips-2005-Radial Basis Function Network for Multi-task Learning

162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis

163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks

165 nips-2005-Response Analysis of Neuronal Population with Synaptic Depression

166 nips-2005-Robust Fisher Discriminant Analysis

167 nips-2005-Robust design of biological experiments

168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

169 nips-2005-Saliency Based on Information Maximization

170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

171 nips-2005-Searching for Character Models

172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning

173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception

174 nips-2005-Separation of Music Signals by Harmonic Structure Modeling

175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

176 nips-2005-Silicon growth cones map silicon retina

177 nips-2005-Size Regularized Cut for Data Clustering

178 nips-2005-Soft Clustering on Graphs

179 nips-2005-Sparse Gaussian Processes using Pseudo-inputs

180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms

181 nips-2005-Spiking Inputs to a Winner-take-all Network

182 nips-2005-Statistical Convergence of Kernel CCA

183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity

184 nips-2005-Structured Prediction via the Extragradient Method

185 nips-2005-Subsequence Kernels for Relation Extraction

186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration

187 nips-2005-Temporal Abstraction in Temporal-difference Networks

188 nips-2005-Temporally changing synaptic plasticity

189 nips-2005-Tensor Subspace Analysis

190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget

192 nips-2005-The Information-Form Data Association Filter

193 nips-2005-The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

194 nips-2005-Top-Down Control of Visual Attention: A Rational Account

195 nips-2005-Transfer learning for text classification

196 nips-2005-Two view learning: SVM-2K, Theory and Practice

197 nips-2005-Unbiased Estimator of Shape Parameter for Spiking Irregularities under Changing Environments

198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

203 nips-2005-Visual Encoding with Jittering Eyes

204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

205 nips-2005-Worst-Case Bounds for Gaussian Process Models