nips nips2005 nips2005-155 knowledge-graph by maker-knowledge-mining

155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Source: pdf

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

Abstract: An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural ﬁring or in operating artiﬁcial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear approaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, regularizing against overﬁtting. In comparison with ordinary least squares, stepwise regression, partial least squares, LASSO regression and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method oﬀers a superior mixture of characteristics in terms of regularization against overﬁtting, computational eﬃciency and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientiﬁc results, our analyses demonstrate that EMG data can be well predicted from M1 neurons, further opening the path for possible real-time interfaces between brains and machines. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. [sent-5, score-0.349]

2 In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear approaches from neural activity in primary motor cortex (M1). [sent-6, score-0.182]

3 To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, regularizing against overﬁtting. [sent-7, score-0.37]

4 As neuroscientiﬁc results, our analyses demonstrate that EMG data can be well predicted from M1 neurons, further opening the path for possible real-time interfaces between brains and machines. [sent-9, score-0.04]

5 1 Introduction In recent years, there has been growing interest in large scale analyses of brain activity with respect to associated behavioral variables. [sent-10, score-0.113]

6 In these projects, the brain signals to be processed are typically high dimensional, on the order of hundreds or thousands of inputs, with large numbers of redundant and irrelevant signals. [sent-12, score-0.038]

7 Linear modeling techniques like linear regression are among the primary analysis tools [6, 7] for such data. [sent-13, score-0.342]

8 , for continual online interpretation of brain activity to control prosthetic devices or for longitudinal scientiﬁc studies of information processing in the brain. [sent-17, score-0.107]

9 Surprisingly, robust linear modeling of high dimensional data is non-trivial as the danger of ﬁtting noise and encountering numerical problems is high. [sent-18, score-0.094]

10 Classical techniques like ridge regression, stepwise regression or partial least squares regression are known to be prone to overﬁtting and require careful human supervision to ensure useful results. [sent-19, score-0.889]

11 For this purpose, we investigate a full Bayesian treatment of linear regression with automatic relevance detection [8]. [sent-21, score-0.401]

12 Such an algorithm, called Variational Bayesian Least Squares (VBLS), can be formulated in closed form with the help of a variational Bayesian approximation and turns out to be computationally highly eﬃcient. [sent-22, score-0.068]

13 We apply VBLS to the reconstruction of EMG data from motor cortical ﬁring, using data sets collected by [9] and [10, 11]. [sent-23, score-0.077]

14 This data analysis addresses important neuroscientiﬁc questions in terms of whether M1 neurons can directly predict EMG traces [12], whether M1 has a muscle-based topological organization and whether information in M1 should be used to predict behavior in future brainmachine interfaces. [sent-24, score-0.113]

15 Our main focus in this paper, however, will be on the robust statistical analysis of these kinds of data. [sent-25, score-0.053]

16 Comparisons with classical linear analysis techniques and a brute force combinatorial model search on a cluster computer demonstrate that our VBLS algorithm achieves the “black box” quality of a robust statistical analysis technique without any tunable parameters. [sent-26, score-0.241]

17 2 High Dimensional Regression Before developing our VBLS algorithm, let us brieﬂy revisit classical linear regression techniques. [sent-28, score-0.349]

18 The standard model for linear regression is: y= d X bm xm + (1) m=1 where b is the regression vector composed of bm components, d is the number of input dimensions, is additive mean-zero noise, x are the inputs and y are the outputs. [sent-29, score-1.317]

19 The Ordinary Least Squares (OLS) estimate of the regression vector is −1 b = XT X XT y. [sent-30, score-0.281]

20 The main problem with OLS regression in high dimensional −1 input spaces is that the full rank assumption of XT X is often violated due to underconstrained data sets. [sent-31, score-0.31]

21 Ridge regression can “ﬁx” such problems numerically, but introduces uncontrolled bias. [sent-32, score-0.281]

22 To xi1 xi1 b1 yi xid (a) Linear regression α1 αd bd yi xid zid bd i=1. [sent-37, score-0.558]

23 N (b) Probabilistic backﬁtting zi1 b1 yi xid i=1. [sent-39, score-0.099]

24 Another class of linear regression methods are projection regression techniques, most notably Partial Least Squares Regression (PLS) [18]. [sent-47, score-0.628]

25 PLS performs computationally inexpensive O(d) univariate regressions along projection directions, chosen according to the correlation between inputs and outputs. [sent-48, score-0.085]

26 While slightly heuristic in nature, PLS is a surprisingly successful algorithm for ill-conditioned and high-dimensional regression problems, although it also has a tendency towards overﬁtting [16]. [sent-49, score-0.281]

27 LASSO (Least Absolute Shrinkage and Selection Operator) regression [19] shrinks certain regression coeﬃcients to 0, giving interpretable models that are sparse. [sent-50, score-0.562]

28 Finally, there are also more eﬃcient methods for matrix inversion [20, 21], which, however, assume a well-condition regression problem a priori and degrade in the presence of collinearities in inputs. [sent-52, score-0.31]

29 In the following section, we develop a linear regression algorithm in a Bayesian framework that automatically regularizes against problems of overﬁtting. [sent-53, score-0.348]

30 Moreover, the iterative nature of the algorithm, due to its formulation as an ExpectationMaximization problem [22], avoids the computational cost and numerical problems of matrix inversions. [sent-54, score-0.05]

31 Conceptually, the algorithm can be interpreted as a Bayesian version of either backﬁtting or partial least squares regression. [sent-56, score-0.147]

32 3 Variational Bayesian Least Squares Figure 1 illustrates the progression of graphical models that we need in order to develop a robust Bayesian version of linear regression. [sent-57, score-0.087]

33 In the spirit of PLS, if we knew an optimal projection direction of the input data, then the entire regression problem could be solved by a univariate regression between the projected data and the outputs. [sent-59, score-0.615]

34 This optimal projection direction is simply the true gradient between inputs and outputs. [sent-60, score-0.064]

35 Then, the zim are summed up to form a predicted output yi . [sent-67, score-0.346]

36 (1) is modiﬁed to become: zim = bm xim yi = d X m=1 zim + For a probabilistic treatment with EM, we make a standard normal assumption of all distributions in form of: ” “ yi |zi ∼ Normal yi ; 1T zi , ψy zim |xi ∼ Normal (zim ; bm xim , ψzm ) where 1 = [1, 1, . [sent-69, score-2.203]

37 While this model is still identical to OLS, notice that in the graphical model, the regression coeﬃcients bm are behind the fan-in to the outputs N yi . [sent-72, score-0.686]

38 It can be proved that this EM version of least squares regression is guaranteed to converge to the same solution as OLS [23]. [sent-77, score-0.401]

39 This new EM algorithm appears to only replace the matrix inversion in OLS by an iterative method, as others have done with alternative algorithms [20, 21], although the convergence guarantees of EM are an improvement over previous approaches. [sent-78, score-0.055]

40 1 Automatic Relevance Determination From a Bayesian point of view, the parameters bm should be treated probabilistically so that we can integrate them out to safeguard against overﬁtting. [sent-81, score-0.333]

41 For this purpose, as shown in Figure 1c, we introduce precision variables αm over each regression parameter bm : p(b|α) = p(α) = Qd ` αm ´ 1 ˘ ¯ m 2 exp − α2 b2 m 2π Qd ba α (aα −1) α αm exp {−bα αm } m=1 Gamma(aα ) m=1 (3) where α is the vector of all αm . [sent-82, score-0.614]

42 In order to obtain a tractable posterior distribution over all hidden variables b, zim and α, we use a factorial variational approximation of the true posterior Q(α, b, Z) = Q(α, b)Q(Z). [sent-83, score-0.386]

43 Note that the connection from the αm to the corresponding zim in Figure 1c is an intentional design. [sent-84, score-0.296]

44 Under this graphical model, the marginal distribution of bm becomes a Student t-distribution that allows traditional hypothesis testing [24]. [sent-85, score-0.355]

45 Since bm is zero mean, a very large αm (equivalent to a very small variance of bm ) suggests that bm is very close to 0 and has no contribution to the output. [sent-88, score-0.999]

46 (5) demonstrates that in the absence of a correlation between the current input dimension and the residual error, the ﬁrst term causes the current regression coeﬃcient to decay. [sent-91, score-0.281]

47 The resulting regression solution regularizes over the number of retained inputs in the ﬁnal regression vector, performing a functionality similar to Automatic Relevance Determination (ARD) [8]. [sent-92, score-0.651]

48 One can further show that the marginal distribution of all a bm is a t-distribution with t = bm |αm /σbm |αm and 2ˆα degrees of freedom, which allows a principled way of determining whether a regression coeﬃcient was excluded by means of standard hypothesis testing. [sent-94, score-0.947]

49 Thus, Variational Bayesian Least Squares (VBLS) regression is a full Bayesian treatment of the linear regression problem. [sent-95, score-0.627]

50 The key questions addressed in this application were i) whether EMG data can be reconstructed accurately with good generalization, ii) how many neurons contribute to the reconstruction of each muscle and iii) how well the VBLS algorithm compares to other analysis techniques. [sent-97, score-0.217]

51 The underlying assumption of this analysis is that the relationship between neural ﬁring and muscle activity is approximately linear. [sent-98, score-0.139]

52 In the ﬁrst experiment by Sergio & Kalaska [9], the monkey moved a manipulandum in a center-out task in eight diﬀerent directions, equally spaced in a horizontal planar circle of 8cm radius. [sent-101, score-0.078]

53 A variation of this experiment held the manipulandum rigidly in place, while the monkey applied isometric forces in the same eight directions. [sent-102, score-0.102]

54 Neural activity for 71 M1 neurons was recorded in all conditions (2400 data points for each neuron), along with the EMG outputs of 11 muscles. [sent-104, score-0.148]

55 [10] involved a monkey trained to perform eight diﬀerent combinations of wrist ﬂexion-extension and radial-ulnar movements while in three diﬀerent arm postures (pronated, supinated and midway between the two). [sent-106, score-0.17]

56 The data set consisted of neural data of 92 M1 neurons that were recorded 3 3 OLS STEP PLS LASSO VBLS ModelSearch 2. [sent-107, score-0.113]

57 3% Table 1: Percentage neuron matches between baseline and all other algorithms, averaged over all muscles in the data set at all three wrist postures (producing 2664 data points for each neuron) and the EMG outputs of 7 contributing muscles. [sent-123, score-0.162]

58 In all experiments, the neural data was represented as average ﬁring rates and was time aligned with EMG data based on analyses that are outside of the scope of this paper. [sent-124, score-0.04]

59 2 Methods For the Sergio & Kalaska data set, a baseline comparison of good EMG reconstruction was obtained through a limited combinatorial search over possible regression models. [sent-126, score-0.375]

60 A particular model is characterized by a subset of neurons that is used to predict the EMG data. [sent-127, score-0.113]

61 The optimal predictive subset of neurons was determined from an 8-fold cross validation. [sent-131, score-0.113]

62 This baseline study served as a comparison for PLS, stepwise regression, LASSO regression, OLS and VBLS. [sent-132, score-0.2]

63 The ﬁve other algorithms used the same validation sets employed in the baseline study. [sent-133, score-0.079]

64 LASSO regression was implemented, manually choosing the optimal tuning parameter over all cross-validation sets. [sent-136, score-0.281]

65 OLS was implemented using a small ridge regression parameter of 10−10 in order to avoid ill-conditioned matrix inversions. [sent-137, score-0.317]

66 Inference of relevant neurons in PLS was based on the subspace spanned by the PLS projections, while relevant neurons in VBLS were inferred from t-tests on the regression parameters, using a signiﬁcance of p < 0. [sent-140, score-0.577]

67 Stepwise regression and LASSO regression determined the number of relevant neurons from the inputs that were included in the ﬁnal model. [sent-142, score-0.742]

68 Note that since OLS retained all input dimensions, this algorithm was omitted in relevant neuron comparisons. [sent-143, score-0.082]

69 Analogous to the ﬁrst data set, a combinatorial analysis was performed on the Kakei et al. [sent-144, score-0.038]

70 data set in order to determine the optimal set of neurons contributing to each muscle (i. [sent-145, score-0.238]

71 PLS, stepwise regression, LASSO regression, OLS and VBLS were applied using the same cross-validation sets, employing the same procedure described for the ﬁrst data set. [sent-148, score-0.144]

72 VBLS resulted in a generalization error comparable to that produced by the baseline study. [sent-151, score-0.056]

73 dataset, all algorithms performed similarly, with LASSO regression performing a little better than the rest. [sent-153, score-0.281]

74 However, OLS, stepwise regression, LASSO regression and PLS performed far worse on the Sergio & Kalaska dataset, with OLS regression attaining the worst error. [sent-154, score-0.706]

75 Such performance is typical for traditional linear regression methods on ill-conditioned high dimensional data, motivating the development of VBLS. [sent-155, score-0.344]

76 The average number of relevant neurons found by VBLS was slightly higher than the baseline study, as seen in Figure 3. [sent-156, score-0.204]

77 This result is not surprising as the baseline study did not consider all possible combination of neurons. [sent-157, score-0.056]

78 Given the good generalization results of VBLS, it seems that the Bayesian approach regularized the participating neurons suﬃciently so that no overﬁtting occurred. [sent-158, score-0.113]

79 Note that the results for muscle 6 and 7 in Figure 3b seem to be due to some irregularities in the data and should be considered outliers. [sent-159, score-0.104]

80 Table 1 demonstrates that the relevant neurons identiﬁed by VBLS coincided at a very high percentage with those of the baseline results, while PLS, stepwise regression and LASSO regression had inferior outcomes. [sent-160, score-0.91]

81 Thus, in general, VBLS achieved comparable performance with the baseline study when reconstructing EMG data from M1 neurons. [sent-161, score-0.056]

82 While VBLS is an iterative statistical method, which performs slower than classical “one-shot” linear least squares methods (i. [sent-162, score-0.236]

83 , on the order of several minutes for the data sets in our analyses), it achieved comparable results with our combinatorial model search, which took weeks on a cluster computer. [sent-164, score-0.087]

84 5 Discussion This paper addressed the problem of analyzing high dimensional data with linear regression techniques, as encountered in neuroscience and the new ﬁeld of brain-machine interfaces. [sent-165, score-0.393]

85 To achieve robust statistical results, we introduced a novel Bayesian technique for linear regression analysis with automatic feature detection, called Variational Bayesian Least Squares. [sent-166, score-0.394]

86 Comparisons with classical linear regression methods and a “gold standard” obtained from a brute force search over all possible linear models demonstrate that VBLS performs very well without any manual parameter tuning, such that it has the quality of a “black box” statistical analysis technique. [sent-167, score-0.487]

87 A point of concern against the VBLS algorithm is how the variational approximation in this algorithm aﬀects the quality of function approximation. [sent-168, score-0.068]

88 It is known that factorial approximations to a joint distribution create more peaked distributions, such that one could potentially assume that VBLS might tend to overﬁt. [sent-169, score-0.046]

89 However, in the case of VBLS, a more peaked distribution over bm pushes the regression parameter closer to zero. [sent-170, score-0.638]

90 Future evaluations and comparisons with Markov Chain Monte Carlo methods will reveal more details of the nature of the variational approximation. [sent-172, score-0.115]

91 Regardless, it appears that VBLS could become a useful drop-in replacement for various classical regression methods. [sent-173, score-0.337]

92 It lends itself to incremental implementation as would be needed in real-time analyses of brain information. [sent-174, score-0.078]

93 Predicting the orientation of invisible stimuli from activity in human primary visual cortex. [sent-206, score-0.062]

94 Optimizing a linear algorithm for real-time robotic control using chronic cortical ensemble recordings in monkeys. [sent-212, score-0.117]

95 Changes in the temporal pattern of primary motor cortex activity in a directional isometric force versus limb movement task. [sent-235, score-0.186]

96 Muscle and movement representations in the primary motor cortex. [sent-243, score-0.084]

97 Direct cortical control of muscle activation in voluntary arm movements: a model. [sent-255, score-0.196]

98 Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. [sent-267, score-0.144]

99 Soft modeling by latent variables: The nonlinear iterative partial least squares approach. [sent-297, score-0.173]

100 Maximum likelihood from incomplete data via the em algorithm. [sent-323, score-0.056]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vbls', 0.444), ('bm', 0.333), ('zim', 0.296), ('regression', 0.281), ('pls', 0.263), ('ols', 0.215), ('emg', 0.215), ('xim', 0.214), ('zm', 0.19), ('lasso', 0.153), ('stepwise', 0.144), ('kakei', 0.132), ('sergio', 0.115), ('neurons', 0.113), ('muscle', 0.104), ('kalaska', 0.099), ('squares', 0.086), ('nmse', 0.086), ('pn', 0.079), ('bayesian', 0.071), ('variational', 0.068), ('modelsearch', 0.066), ('baseline', 0.056), ('tting', 0.056), ('em', 0.056), ('yi', 0.05), ('xid', 0.049), ('neuroscience', 0.049), ('coe', 0.046), ('erent', 0.044), ('force', 0.043), ('analyses', 0.04), ('zi', 0.04), ('brute', 0.039), ('combinatorial', 0.038), ('brain', 0.038), ('ridge', 0.036), ('ho', 0.036), ('activity', 0.035), ('relevant', 0.035), ('arm', 0.034), ('control', 0.034), ('classical', 0.034), ('linear', 0.034), ('least', 0.034), ('im', 0.033), ('collinearity', 0.033), ('postures', 0.033), ('regularizes', 0.033), ('vijayakumar', 0.033), ('zid', 0.033), ('inputs', 0.032), ('projection', 0.032), ('ring', 0.031), ('predicting', 0.031), ('treatment', 0.031), ('robust', 0.031), ('motor', 0.03), ('dimensions', 0.029), ('dimensional', 0.029), ('inversion', 0.029), ('relevance', 0.029), ('wrist', 0.029), ('atr', 0.029), ('qd', 0.029), ('partial', 0.027), ('projects', 0.027), ('movement', 0.027), ('update', 0.027), ('primary', 0.027), ('weeks', 0.026), ('manipulandum', 0.026), ('neuroscienti', 0.026), ('iterative', 0.026), ('eight', 0.026), ('monkey', 0.026), ('automatic', 0.026), ('recordings', 0.025), ('di', 0.025), ('retained', 0.024), ('isometric', 0.024), ('peaked', 0.024), ('excludes', 0.024), ('cortical', 0.024), ('nature', 0.024), ('xm', 0.023), ('comparisons', 0.023), ('neuron', 0.023), ('bd', 0.023), ('laboratories', 0.023), ('ave', 0.023), ('sets', 0.023), ('graphical', 0.022), ('montreal', 0.022), ('replacement', 0.022), ('factorial', 0.022), ('statistical', 0.022), ('movements', 0.022), ('contributing', 0.021), ('univariate', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

2 0.16849484 17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

Abstract: We present a novel cochlear model implemented in analog very large scale integration (VLSI) technology that emulates nonlinear active cochlear behavior. This silicon cochlea includes outer hair cell (OHC) electromotility through active bidirectional coupling (ABC), a mechanism we proposed in which OHC motile forces, through the microanatomical organization of the organ of Corti, realize the cochlear ampliﬁer. Our chip measurements demonstrate that frequency responses become larger and more sharply tuned when ABC is turned on; the degree of the enhancement decreases with input intensity as ABC includes saturation of OHC forces. 1 Silicon Cochleae Cochlear models, mathematical and physical, with the shared goal of emulating nonlinear active cochlear behavior, shed light on how the cochlea works if based on cochlear micromechanics. Among the modeling efforts, silicon cochleae have promise in meeting the need for real-time performance and low power consumption. Lyon and Mead developed the ﬁrst analog electronic cochlea [1], which employed a cascade of second-order ﬁlters with exponentially decreasing resonant frequencies. However, the cascade structure suffers from delay and noise accumulation and lacks fault-tolerance. Modeling the cochlea more faithfully, Watts built a two-dimensional (2D) passive cochlea that addressed these shortcomings by incorporating the cochlear ﬂuid using a resistive network [2]. This parallel structure, however, has its own problem: response gain is diminished by interference among the second-order sections’ outputs due to the large phase change at resonance [3]. Listening more to biology, our silicon cochlea aims to overcome the shortcomings of existing architectures by mimicking the cochlear micromechanics while including outer hair cell (OHC) electromotility. Although how exactly OHC motile forces boost the basilar membrane’s (BM) vibration remains a mystery, cochlear microanatomy provides clues. Based on these clues, we previously proposed a novel mechanism, active bidirectional coupling (ABC), for the cochlear ampliﬁer [4]. Here, we report an analog VLSI chip that implements this mechanism. In essence, our implementation is the ﬁrst silicon cochlea that employs stimulus enhancement (i.e., active behavior) instead of undamping (i.e., high ﬁlter Q [5]). The paper is organized as follows. In Section 2, we present the hypothesized mechanism (ABC), ﬁrst described in [4]. In Section 3, we provide a mathematical formulation of the Oval window organ of Corti BM Round window IHC RL A OHC PhP DC BM Basal @ Stereocilia i -1 i i+1 Apical B Figure 1: The inner ear. A Cutaway showing cochlear ducts (adapted from [6]). B Longitudinal view of cochlear partition (CP) (modiﬁed from [7]-[8]). Each outer hair cell (OHC) tilts toward the base while the Deiter’s cell (DC) on which it sits extends a phalangeal process (PhP) toward the apex. The OHCs’ stereocilia and the PhPs’ apical ends form the reticular lamina (RL). d is the tilt distance, and the segment size. IHC: inner hair cell. model as the basis of cochlear circuit design. Then we proceed in Section 4 to synthesize the circuit for the cochlear chip. Last, we present chip measurements in Section 5 that demonstrate nonlinear active cochlear behavior. 2 Active Bidirectional Coupling The cochlea actively ampliﬁes acoustic signals as it performs spectral analysis. The movement of the stapes sets the cochlear ﬂuid into motion, which passes the stimulus energy onto a certain region of the BM, the main vibrating organ in the cochlea (Figure 1A). From the base to the apex, BM ﬁbers increase in width and decrease in thickness, resulting in an exponential decrease in stiffness which, in turn, gives rise to the passive frequency tuning of the cochlea. The OHCs’ electromotility is widely thought to account for the cochlea’s exquisite sensitivity and discriminability. The exact way that OHC motile forces enhance the BM’s motion, however, remains unresolved. We propose that the triangular mechanical unit formed by an OHC, a phalangeal process (PhP) extended from the Deiter’s cell (DC) on which the OHC sits, and a portion of the reticular lamina (RL), between the OHC’s stereocilia end and the PhP’s apical tip, plays an active role in enhancing the BM’s responses (Figure 1B). The cochlear partition (CP) is divided into a number of segments longitudinally. Each segment includes one DC, one PhP’s apical tip and one OHC’s stereocilia end, both attached to the RL. Approximating the anatomy, we assume that when an OHC’s stereocilia end lies in segment i − 1, its basolateral end lies in the immediately apical segment i. Furthermore, the DC in segment i extends a PhP that angles toward the apex of the cochlea, with its apical end inserted just behind the stereocilia end of the OHC in segment i + 1. Our hypothesis (ABC) includes both feedforward and feedbackward interactions. On one hand, the feedforward mechanism, proposed in [9], hypothesized that the force resulting from OHC contraction or elongation is exerted onto an adjacent downstream BM segment due to the OHC’s basal tilt. On the other hand, the novel insight of the feedbackward mechanism is that the OHC force is delivered onto an adjacent upstream BM segment due to the apical tilt of the PhP extending from the DC’s main trunk. In a nutshell, the OHC motile forces, through the microanatomy of the CP, feed forward and backward, in harmony with each other, resulting in bidirectional coupling between BM segments in the longitudinal direction. Speciﬁcally, due to the opposite action of OHC S x M x Re Zm 1 0.5 0 0.2 0 A 5 10 15 20 Distance from stapes mm 25 B Figure 2: Wave propagation (WP) and basilar membrane (BM) impedance in the active cochlear model with a 2kHz pure tone (α = 0.15, γ = 0.3). A WP in ﬂuid and BM. B BM impedance Zm (i.e., pressure divided by velocity), normalized by S(x)M (x). Only the resistive component is shown; dot marks peak location. forces on the BM and the RL, the motion of BM segment i − 1 reinforces that of segment i while the motion of segment i + 1 opposes that of segment i, as described in detail in [4]. 3 The 2D Nonlinear Active Model To provide a blueprint for the cochlear circuit design, we formulate a 2D model of the cochlea that includes ABC. Both the cochlea’s length (BM) and height (cochlear ducts) are discretized into a number of segments, with the original aspect ratio of the cochlea maintained. In the following expressions, x represents the distance from the stapes along the CP, with x = 0 at the base (or the stapes) and x = L (uncoiled cochlear duct length) at the apex; y represents the vertical distance from the BM, with y = 0 at the BM and y = ±h (cochlear duct radius) at the bottom/top wall. Providing that the assumption of ﬂuid incompressibility holds, the velocity potential φ of the ﬂuids is required to satisfy 2 φ(x, y, t) = 0, where 2 denotes the Laplacian operator. By deﬁnition, this potential is related to ﬂuid velocities in the x and y directions: Vx = −∂φ/∂x and Vy = −∂φ/∂y. The BM is driven by the ﬂuid pressure difference across it. Hence, the BM’s vertical motion (with downward displacement being positive) can be described as follows. ˙ ¨ Pd (x) + FOHC (x) = S(x)δ(x) + β(x)δ(x) + M (x)δ(x), (1) where S(x) is the stiffness, β(x) is the damping, and M (x) is the mass, per unit area, of the BM; δ is the BM’s downward displacement. Pd = ρ ∂(φSV (x, y, t) − φST (x, y, t))/∂t is the pressure difference between the two ﬂuid ducts (the scala vestibuli (SV) and the scala tympani (ST)), evaluated at the BM (y = 0); ρ is the ﬂuid density. The FOHC(x) term combines feedforward and feedbackward OHC forces, described by FOHC (x) = s0 tanh(αγS(x)δ(x − d)/s0 ) − tanh(αS(x)δ(x + d)/s0 ) , (2) where α denotes the OHC motility, expressed as a fraction of the BM stiffness, and γ is the ratio of feedforward to feedbackward coupling, representing relative strengths of the OHC forces exerted on the BM segment through the DC, directly and via the tilted PhP. d denotes the tilt distance, which is the horizontal displacement between the source and the recipient of the OHC force, assumed to be equal for the forward and backward cases. We use the hyperbolic tangent function to model saturation of the OHC forces, the nonlinearity that is evident in physiological measurements [8]; s0 determines the saturation level. We observed wave propagation in the model and computed the BM’s impedance (i.e., the ratio of driving pressure to velocity). Following the semi-analytical approach in [2], we simulated a linear version of the model (without saturation). The traveling wave transitions from long-wave to short-wave before the BM vibration peaks; the wavelength around the characteristic place is comparable to the tilt distance (Figure 2A). The BM impedance’s real part (i.e., the resistive component) becomes negative before the peak (Figure 2B). On the whole, inclusion of OHC motility through ABC boosts the traveling wave by pumping energy onto the BM when the wavelength matches the tilt of the OHC and PhP. 4 Analog VLSI Design and Implementation Based on our mathematical model, which produces realistic responses, we implemented a 2D nonlinear active cochlear circuit in analog VLSI, taking advantage of the 2D nature of silicon chips. We ﬁrst synthesize a circuit analog of the mathematical model, and then we implement the circuit in the log-domain. We start by synthesizing a passive model, and then extend it to a nonlinear active one by including ABC with saturation. 4.1 Synthesizing the BM Circuit The model consists of two fundamental parts: the cochlear ﬂuid and the BM. First, we design the ﬂuid element and thus the ﬂuid network. In discrete form, the ﬂuids can be viewed as a grid of elements with a speciﬁc resistance that corresponds to the ﬂuid density or mass. Since charge is conserved for a small sheet of resistance and so are particles for a small volume of ﬂuid, we use current to simulate ﬂuid velocity. At the transistor level, the current ﬂowing through the channel of a MOS transistor, operating subthreshold as a diffusive element, can be used for this purpose. Therefore, following the approach in [10], we implement the cochlear ﬂuid network using a diffusor network formed by a 2D grid of nMOS transistors. Second, we design the BM element and thus the BM. As current represents velocity, we rewrite the BM boundary condition (Equation 1, without the FOHC term): ˙ Iin = S(x) Imem dt + β(x)Imem + M (x)I˙mem , (3) where Iin , obtained by applying the voltage from the diffusor network to the gate of a pMOS transistor, represents the velocity potential scaled by the ﬂuid density. In turn, Imem ˙ drives the diffusor network to match the ﬂuid velocity with the BM velocity, δ. The FOHC term is dealt with in Section 4.2. Implementing this second-order system requires two state-space variables, which we name Is and Io . And with s = jω, our synthesized BM design (passive) is τ1 Is s + Is τ2 Io s + Io Imem = −Iin + Io , = Iin − bIs , = Iin + Is − Io , (4) (5) (6) where the two ﬁrst-order systems are both low-pass ﬁlters (LPFs), with time constants τ1 and τ2 , respectively; b is a gain factor. Thus, Iin can be expressed in terms of Imem as: Iin s2 = (b + 1)/τ1 τ2 + ((τ1 + τ2 )/τ1 τ2)s + s2 Imem . Comparing this expression with the design target (Equation 3) yields the circuit analogs: S(x) = (b + 1)/τ1τ2 , β(x) = (τ1 + τ2 )/τ1 τ2 , and M (x) = 1. Note that the mass M (x) is a constant (i.e., 1), which was also the case in our mathematical model simulation. These analogies require that τ1 and τ2 increase exponentially to Half LPF ( ) + Iout- Iin+ Iout+ Iout Vq Iin+ Iin- C+ B Iin- A Iin+ + - - Iin- + + + C To neighbors Is- Is+ > + > + IT+ IT- + - + From neighbors Io- Io+ + + + + - - + + LPF Iout+ Iout- BM Imem+ Imem- Figure 3: Low-pass ﬁlter (LPF) and second-order section circuit design. A Half-LPF circuit. B Complete LPF circuit formed by two half-LPF circuits. C Basilar membrane (BM) circuit. It consists of two LPFs and connects to its neighbors through Is and IT . simulate the exponentially decreasing BM stiffness (and damping); b allows us to achieve a reasonable stiffness for a practical choice of τ1 and τ2 (capacitor size is limited by silicon area). 4.2 Adding Active Bidirectional Coupling To include ABC in the BM boundary condition, we replace δ in Equation 2 with Imem dt to obtain FOHC = rﬀ S(x)T Imem (x − d)dt − rfb S(x)T Imem (x + d)dt , where rﬀ = αγ and rfb = α denote the feedforward and feedbackward OHC motility factors, and T denotes saturation. The saturation is applied to the displacement, instead of the force, as this simpliﬁes the implementation. We obtain the integrals by observing that, in the passive design, the state variable Is = −Imem /sτ1 . Thus, Imem (x − d)dt = −τ1f Isf and Imem (x + d)dt = −τ1b Isb . Here, Isf and Isb represent the outputs of the ﬁrst LPF in the upstream and downstream BM segments, respectively; τ1f and τ1b represent their respective time constants. To reduce complexity in implementation, we use τ1 to approximate both τ1f and τ1b as the longitudinal span is small. We obtain the active BM design by replacing Equation 5 with the synthesis result: τ2 Ios + Io = Iin − bIs + rfb (b + 1)T (−Isb ) − rﬀ (b + 1)T (−Isf ). Note that, to implement ABC, we only need to add two currents to the second LPF in the passive system. These currents, Isf and Isb , come from the upstream and downstream neighbors of each segment. ISV Fluid Base BM IST Apex Fluid A IT + IT Is+ Is- + Vsat Imem Iin+ Imem- Iin- Is+ Is+ Is- IsBM IT + IT + - I IT T Vsat IT + IT Is+ Is- B Figure 4: Cochlear chip. A Architecture: Two diffusive grids with embedded BM circuits model the cochlea. B Detail. BM circuits exchange currents with their neighbors. 4.3 Class AB Log-domain Implementation We employ the log-domain ﬁltering technique [11] to realize current-mode operation. In addition, following the approach proposed in [12], we implement the circuit in Class AB to increase dynamic range, reduce the effect of mismatch and lower power consumption. This differential signaling is inspired by the way the biological cochlea works—the vibration of BM is driven by the pressure difference across it. Taking a bottom-up strategy, we start by designing a Class AB LPF, a building block for the BM circuit. It is described by + − + − + − + − + − 2 τ (Iout − Iout )s + (Iout − Iout ) = Iin − Iin and τ Iout Iout s + Iout Iout = Iq , where Iq sets the geometric mean of the positive and negative components of the output current, and τ sets the time constant. Combining the common-mode constraint with the differential design equation yields the nodal equation for the positive path (the negative path has superscripts + and − swapped): + − + + + − 2 ˙+ C Vout = Iτ (Iin − Iin ) + (Iq /Iout − Iout ) /(Iout + Iout ). + This nodal equation suggests the half-LPF circuit shown in Figure 3A. Vout , the voltage on + the positive capacitor (C ), gates a pMOS transistor to produce the corresponding current + − − signal, Iout (Vout and Iout are similarly related). The bias Vq sets the quiescent current Iq while Vτ determines the current Iτ , which is related to the time constant by τ = CuT/κIτ (κ is the subthreshold slope coefﬁcient and uT is the thermal voltage). Two of these subcircuits, connected in push–pull, form a complete LPF (Figure 3B). The BM circuit is implemented using two LPFs interacting in accordance with the synthesized design equations (Figure 3C). Imem is the combination of three currents, Iin , Is , and Io . Each BM sends out Is and receives IT , a saturated version of its neighbor’s Is . The saturation is accomplished by a current-limiting transistor (see Figure 4B), which yields IT = T (Is ) = Is Isat /(Is + Isat ), where Isat is set by a bias voltage Vsat. 4.4 Chip Architecture We fabricated a version of our cochlear chip architecture (Figure 4) with 360 BM circuits and two 4680-element ﬂuid grids (360 ×13). This chip occupies 10.9mm2 of silicon area in 0.25µm CMOS technology. Differential input signals are applied at the base while the two ﬂuid grids are connected at the apex through a ﬂuid element that represents the helicotrema. 5 Chip Measurements We carried out two measurements that demonstrate the desired ampliﬁcation by ABC, and the compressive growth of BM responses due to saturation. To obtain sinusoidal current as the input to the BM subcircuits, we set the voltages applied at the base to be the logarithm of a half-wave rectiﬁed sinusoid. We ﬁrst investigated BM-velocity frequency responses at six linearly spaced cochlear positions (Figure 5). The frequency that maximally excites the ﬁrst position (Stage 30), deﬁned as its characteristic frequency (CF), is 12.1kHz. The remaining ﬁve CFs, from early to later stages, are 8.2k, 1.7k, 905, 366, and 218Hz, respectively. Phase accumulation at the CFs ranges from 0.56 to 2.67π radians, comparable to 1.67π radians in the mammalian cochlea [13]. Q10 factor (the ratio of the CF to the bandwidth 10dB below the peak) ranges from 1.25 to 2.73, comparable to 2.55 at mid-sound intensity in biology (computed from [13]). The cutoff slope ranges from -20 to -54dB/octave, as compared to -85dB/octave in biology (computed from [13]). BM Velocity Amplitude dB 40 Stage 0 230 190 150 110 70 30 30 20 10 0 BM Velocity Phase Π radians 50 2 4 10 0.1 0.2 0.5 1 2 5 Frequency kHz A 10 20 0.1 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 5: Measured BM-velocity frequency responses at six locations. A Amplitude. B Phase. Dashed lines: Biological data (adapted from [13]). Dots mark peaks. We then explored the longitudinal pattern of BM-velocity responses and the effect of ABC. Stimulating the chip using four different pure tones, we obtained responses in which a 4kHz input elicits a peak around Stage 85 while 500Hz sound travels all the way to Stage 178 and peaks there (Figure 6A). We varied the input voltage level and obtained frequency responses at Stage 100 (Figure 6B). Input voltage level increases linearly such that the current increases exponentially; the input current level (in dB) was estimated based on the measured κ for this chip. As expected, we observed linearly increasing responses at low frequencies in the logarithmic plot. In contrast, the responses around the CF increase less and become broader with increasing input level as saturation takes effect in that region (resembling a passive cochlea). We observed 24dB compression as compared to 27 to 47dB in biology [13]. At the highest intensities, compression also occurs at low frequencies. These chip measurements demonstrate that inclusion of ABC, simply through coupling neighboring BM elements, transforms a passive cochlea into an active one. This active cochlear model’s nonlinear responses are qualitatively comparable to physiological data. 6 Conclusions We presented an analog VLSI implementation of a 2D nonlinear cochlear model that utilizes a novel active mechanism, ABC, which we proposed to account for the cochlear ampliﬁer. ABC was shown to pump energy into the traveling wave. Rather than detecting the wave’s amplitude and implementing an automatic-gain-control loop, our biomorphic model accomplishes this simply by nonlinear interactions between adjacent neighbors. Im- 60 Frequency 4k 2k 1k 500 Hz BM Velocity Amplitude dB BM Velocity Amplitude dB 20 10 0 Input Level 40 48 dB 20 Stage 100 32 dB 16 dB 0 0 dB 10 0 50 100 150 Stage Number A 200 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 6: Measured BM-velocity responses (cont’d). A Longitudinal responses (20-stage moving average). Peak shifts to earlier (basal) stages as input frequency increases from 500 to 4kHz. B Effects of increasing input intensity. Responses become broader and show compressive growth. plemented in the log-domain, with Class AB operation, our silicon cochlea shows enhanced frequency responses, with compressive behavior around the CF, when ABC is turned on. These features are desirable in prosthetic applications and automatic speech recognition systems as they capture the properties of the biological cochlea. References [1] Lyon, R.F. & Mead, C.A. (1988) An analog electronic cochlea. IEEE Trans. Acoust. Speech and Signal Proc., 36: 1119-1134. [2] Watts, L. (1993) Cochlear Mechanics: Analysis and Analog VLSI . Ph.D. thesis, Pasadena, CA: California Institute of Technology. [3] Fragni`re, E. (2005) A 100-Channel analog CMOS auditory ﬁlter bank for speech recognition. e IEEE International Solid-State Circuits Conference (ISSCC 2005) , pp. 140-141. [4] Wen, B. & Boahen, K. (2003) A linear cochlear model with active bi-directional coupling. The 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2003), pp. 2013-2016. [5] Sarpeshkar, R., Lyon, R.F., & Mead, C.A. (1996) An analog VLSI cochlear model with new transconductance ampliﬁer and nonlinear gain control. Proceedings of the IEEE Symposium on Circuits and Systems (ISCAS 1996) , 3: 292-295. [6] Mead, C.A. (1989) Analog VLSI and Neural Systems . Reading, MA: Addison-Wesley. [7] Russell, I.J. & Nilsen, K.E. (1997) The location of the cochlear ampliﬁer: Spatial representation of a single tone on the guinea pig basilar membrane. Proc. Natl. Acad. Sci. USA, 94: 2660-2664. [8] Geisler, C.D. (1998) From sound to synapse: physiology of the mammalian ear . Oxford University Press. [9] Geisler, C.D. & Sang, C. (1995) A cochlear model using feed-forward outer-hair-cell forces. Hearing Research , 86: 132-146. [10] Boahen, K.A. & Andreou, A.G. (1992) A contrast sensitive silicon retina with reciprocal synapses. In Moody, J.E. and Lippmann, R.P. (eds.), Advances in Neural Information Processing Systems 4 (NIPS 1992) , pp. 764-772, Morgan Kaufmann, San Mateo, CA. [11] Frey, D.R. (1993) Log-domain ﬁltering: an approach to current-mode ﬁltering. IEE Proc. G, Circuits Devices Syst., 140 (6): 406-416. [12] Zaghloul, K. & Boahen, K.A. (2005) An On-Off log-domain circuit that recreates adaptive ﬁltering in the retina. IEEE Transactions on Circuits and Systems I: Regular Papers , 52 (1): 99-107. [13] Ruggero, M.A., Rich, N.C., Narayan, S.S., & Robles, L. (1997) Basilar membrane responses to tones at the base of the chinchilla cochlea. J. Acoust. Soc. Am., 101 (4): 2151-2163.

3 0.13890038 77 nips-2005-From Lasso regression to Feature vector machine

Author: Fan Li, Yiming Yang, Eric P. Xing

Abstract: Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels deﬁned on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.

4 0.12730536 192 nips-2005-The Information-Form Data Association Filter

Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun

Abstract: This paper presents a new ﬁlter for online data association problems in high-dimensional spaces. The key innovation is a representation of the data association posterior in information form, in which the “proximity” of objects and tracks are expressed by numerical links. Updating these links requires linear time, compared to exponential time required for computing the exact posterior probabilities. The paper derives the algorithm formally and provides comparative results using data obtained by a real-world camera array and by a large-scale sensor network simulation.

5 0.11059596 19 nips-2005-Active Learning for Misspecified Models

Author: Masashi Sugiyama

Abstract: Active learning is the problem in supervised learning to design the locations of training input points so that the generalization error is minimized. Existing active learning methods often assume that the model used for learning is correctly speciﬁed, i.e., the learning target function can be expressed by the model at hand. In many practical situations, however, this assumption may not be fulﬁlled. In this paper, we ﬁrst show that the existing active learning method can be theoretically justiﬁed under slightly weaker condition: the model does not have to be correctly speciﬁed, but slightly misspeciﬁed models are also allowed. However, it turns out that the weakened condition is still restrictive in practice. To cope with this problem, we propose an alternative active learning method which can be theoretically justiﬁed for a wider class of misspeciﬁed models. Thus, the proposed method has a broader range of applications than the existing method. Numerical studies show that the proposed active learning method is robust against the misspeciﬁcation of models and is thus reliable. 1 Introduction and Problem Formulation Let us discuss the regression problem of learning a real-valued function Ê from training examples ´Ü Ý µ ´Ü µ · ¯ Ý Ò ´Üµ deﬁned on ½ where ¯ Ò ½ are i.i.d. noise with mean zero and unknown variance ¾. We use the following linear regression model for learning. ´Ü µ ´µ Ô ½ « ³ ´Ü µ where ³ Ü Ô ½ are ﬁxed linearly independent functions and are parameters to be learned. ´ µ « ´«½ «¾ « Ô µ We evaluate the goodness of the learned function Ü by the expected squared test error over test input points and noise (i.e., the generalization error). When the test input points are drawn independently from a distribution with density ÔØ Ü , the generalization error is expressed as ´ µ ¯ ´Üµ ´Üµ ¾ Ô ´Üµ Ü Ø where ¯ denotes the expectation over the noise ¯ Ò Ô ´Üµ is known1. ½. In the following, we suppose that Ø In a standard setting of regression, the training input points are provided from the environment, i.e., Ü Ò ½ independently follow the distribution with density ÔØ Ü . On the other hand, in some cases, the training input points can be designed by users. In such cases, it is expected that the accuracy of the learning result can be improved if the training input points are chosen appropriately, e.g., by densely locating training input points in the regions of high uncertainty. ´ µ Active learning—also referred to as experimental design—is the problem of optimizing the location of training input points so that the generalization error is minimized. In active learning research, it is often assumed that the regression model is correctly speciﬁed [2, 1, 3], i.e., the learning target function Ü can be expressed by the model. In practice, however, this assumption is often violated. ´ µ In this paper, we ﬁrst show that the existing active learning method can still be theoretically justiﬁed when the model is approximately correct in a strong sense. Then we propose an alternative active learning method which can also be theoretically justiﬁed for approximately correct models, but the condition on the approximate correctness of the models is weaker than that for the existing method. Thus, the proposed method has a wider range of applications. In the following, we suppose that the training input points Ü Ò ½ are independently drawn from a user-deﬁned distribution with density ÔÜ Ü , and discuss the problem of ﬁnding the optimal density function. ´µ 2 Existing Active Learning Method The generalization error deﬁned by Eq.(1) can be decomposed as ·Î is the (squared) bias term and Î is the variance term given by where ¯ ´Üµ ´Üµ ¾ Ô ´Üµ Ü Ø Î and ¯ ´Üµ ¯ ´Üµ ¾ Ô ´Üµ Ü Ø A standard way to learn the parameters in the regression model (1) is the ordinary leastsquares learning, i.e., parameter vector « is determined as follows. « ÇÄË It is known that «ÇÄË is given by Ö« Ò Ñ « ÇÄË where Ä ÇÄË ´ µ ½ Ò ´Ü µ Ý ½ Ä ÇÄË ³ ´Ü µ ¾ Ý and Ý ´Ý½ Ý¾ Ý Ò µ Let ÇÄË , ÇÄË and ÎÇÄË be , and Î for the learned function obtained by the ordinary least-squares learning, respectively. Then the following proposition holds. 1 In some application domains such as web page analysis or bioinformatics, a large number of unlabeled samples—input points without output values independently drawn from the distribution with density ÔØ ´Üµ—are easily gathered. In such cases, a reasonably good estimate of ÔØ ´Üµ may be obtained by some standard density estimation method. Therefore, the assumption that ÔØ ´Üµ is known may not be so restrictive. Proposition 1 ([2, 1, 3]) Suppose that the model is correctly speciﬁed, i.e., the learning target function Ü is expressed as ´µ Ô ´Ü µ Then ½ «£ ³ ´Üµ and ÎÇÄË are expressed as ÇÄË ¼ ÇÄË and Î ¾ ÇÄË Â ÇÄË where ØÖ´ÍÄ Â ÇÄË ÇÄË Ä ÇÄË µ ³ ´Üµ³ ´ÜµÔ ´Üµ Ü Í and Ø Therefore, for the correctly speciﬁed model (1), the generalization error as ÇÄË ¾ ÇÄË is expressed Â ÇÄË Based on this expression, the existing active learning method determines the location of training input points Ü Ò ½ (or the training input density ÔÜ Ü ) so that ÂÇÄË is minimized [2, 1, 3]. ´ µ 3 Analysis of Existing Method under Misspeciﬁcation of Models In this section, we investigate the validity of the existing active learning method for misspeciﬁed models. ´ µ Suppose the model does not exactly include the learning target function Ü , but it approximately includes it, i.e., for a scalar Æ such that Æ is small, Ü is expressed as ´ µ ´Ü µ ´Üµ · ÆÖ´Üµ where ´Üµ is the orthogonal projection of ´Üµ onto the span of residual Ö´Üµ is orthogonal to ³ ´Üµ ½ : Ô Ô ´Üµ ½ «£ ³ ´Üµ Ö´Üµ³ ´ÜµÔ ´Üµ Ü and In this case, the bias term Ø ¼ for ³ ´Üµ ½¾ Ô and the ½ Ô is expressed as ¾ ´ ´Üµ ´Üµµ¾ Ô ´Üµ Ü is constant which does not depend on the training input density Ô ´Üµ, we subtract ¯ ´Üµ ´Üµ Ô ´Üµ Ü · where Ø Ø Since in the following discussion. Ü Then we have the following lemma2 . Lemma 2 For the approximately correct model (3), we have ÇÄË ÇÄË Î ÇÄË where 2 Þ Æ ¾ ÍÄ ¾Â Ö ÇÄË Þ Ä Þ Ç ´Ò ½ µ ´Ö´Ü½µ Ö´Ü¾µ Ö ÇÄË Ö Ô Ö ´Ü Proofs of lemmas are provided in an extended version [6]. Ò µµ Ç ´Æ ¾ µ Note that the asymptotic order in Eq.(1) is in probability since ÎÇÄË is a random variable that includes Ü Ò ½ . The above lemma implies that ½ Ó ´Ò ¾ µ Therefore, the existing active learning method of minimizing Â is still justiﬁed if Æ ½ ¾ µ. However, when Æ Ó ´Ò ½ µ, the existing method may not work well because ¾ Ó ´Ò the bias term is not smaller than the variance term Î , so it can not be ÇÄË ¾ · Ó ´Ò ½µ Â ÇÄË if Æ Ô Ô ÇÄË Ô Ô ÇÄË ÇÄË neglected. 4 New Active Learning Method In this section, we propose a new active learning method based on the weighted leastsquares learning. 4.1 Weighted Least-Squares Learning When the model is correctly speciﬁed, «ÇÄË is an unbiased estimator of «£ . However, for misspeciﬁed models, «ÇÄË is generally biased even asymptotically if Æ ÇÔ . ´½µ The bias of «ÇÄË is actually caused by the covariate shift [5]—the training input density ÔÜ Ü is different from the test input density ÔØ Ü . For correctly speciﬁed models, inﬂuence of the covariate shift can be ignored, as the existing active learning method does. However, for misspeciﬁed models, we should explicitly cope with the covariate shift. ´µ ´ µ Under the covariate shift, it is known that the following weighted least-squares learning is [5]. asymptotically unbiased even if Æ ÇÔ ´½µ Ô ´Ü µ Ô ´Ü µ ½ Ò Ö« Ò Ñ « Ï ÄË ¾ ´Ü µ Ý Ø Ü Asymptotic unbiasedness of «Ï ÄË would be intuitively understood by the following identity, which is similar in spirit to importance sampling: ´Üµ ´Üµ ¾ Ô ´Ü µ Ü ´Üµ ´Üµ Ø ´µ ¾ Ô ´Üµ Ô ´Ü µ Ü Ô ´Üµ Ø Ü Ü In the following, we assume that ÔÜ Ü is strictly positive for all Ü. Let matrix with the -th diagonal element be the diagonal Ô ´Ü µ Ô ´Ü µ Ø Ü Then it can be conﬁrmed that «Ï ÄË is given by « Ä Ï ÄË Ï ÄË Ý where Ä ´ Ï ÄË µ ½ 4.2 Active Learning Based on Weighted Least-Squares Learning Let Ï ÄË , Ï ÄË and ÎÏ ÄË be , and Î for the learned function obtained by the above weighted least-squares learning, respectively. Then we have the following lemma. Lemma 3 For the approximately correct model (3), we have Ï ÄË Î Æ ¾ ÍÄ ¾Â Ï ÄË where Ï ÄË Ï ÄË Â Ï ÄË Þ Ä Þ Ç ´Ò ½ µ Ö Ï ÄË Ö Ô Ô ØÖ´ÍÄ Ï ÄË Ä Ï ÄË Ç ´Æ ¾ Ò ½ µ µ This lemma implies that ¾ Â · Ó ´Ò ½µ ´½µ if Æ ÓÔ Based on this expression, we propose determining the training input density ÔÜ ÂÏ ÄË is minimized. Ï ÄË Ï ÄË Ô ´Üµ so that ´½µ The use of the proposed criterion ÂÏ ÄË can be theoretically justiﬁed when Æ ÓÔ , ½ while the existing criterion ÂÇÄË requires Æ ÓÔ Ò ¾ . Therefore, the proposed method has a wider range of applications. The effect of this extension is experimentally investigated in the next section. ´ 5 µ Numerical Examples We evaluate the usefulness of the proposed active learning method through experiments. Toy Data Set: setting. We ﬁrst illustrate how the proposed method works under a controlled ½ ´µ ´µ ½ · · ½¼¼ ´µ Let and the learning target function Ü be Ü Ü Ü¾ ÆÜ¿. Let Ò ½¼¼ be i.i.d. Gaussian noise with mean zero and standard deviation and ¯ . Let ÔØ Ü ½ be the Gaussian density with mean and standard deviation , which is assumed to be known here. Let Ô and the basis functions be ³ Ü Ü ½ for . Let us consider the following three cases. Æ , where each case corresponds to “correctly speciﬁed”, “approximately correct”, and “misspeciﬁed” (see Figure 1). We choose the training input density ÔÜ Ü from the Gaussian density with mean and standard , where deviation ¼¾ ¿ ´µ ¼ ¼ ¼¼ ¼ ¼ ¼ ½¼ ´µ ¼ ¼¿ ½¾¿ ¼¾ ¾ We compare the accuracy of the following three methods: (A) Proposed active learning criterion + WLS learning : The training input density is determined so that ÂÏ ÄË is minimized. Following the determined input density, training input points Ü ½¼¼ are created and corresponding output values Ý ½¼¼ ½ ½ are observed. Then WLS learning is used for estimating the parameters. (B) Existing active learning criterion + OLS learning [2, 1, 3]: The training input density is determined so that ÂÇÄË is minimized. OLS learning is used for estimating the parameters. (C) Passive learning + OLS learning: The test input density ÔØ Ü is used as the training input density. OLS learning is used for estimating the parameters. ´ µ First, we evaluate the accuracy of ÂÏ ÄË and ÂÇÄË as approximations of Ï ÄË and ÇÄË . The means and standard deviations of Ï ÄË , ÂÏ ÄË , ÇÄË , and ÂÇÄË over runs are (“correctly depicted as functions of in Figure 2. These graphs show that when Æ speciﬁed”), both ÂÏ ÄË and ÂÇÄË give accurate estimates of Ï ÄË and ÇÄË . When Æ (“approximately correct”), ÂÏ ÄË again works well, while ÂÇÄË tends to be negatively biased for large . This result is surprising since as illustrated in Figure 1, the learning target functions with Æ and Æ are visually quite similar. Therefore, it intuitively seems that the result of Æ is not much different from that of Æ . However, the simulation result shows that this slight difference makes ÂÇÄË unreliable. (“misspeciﬁed”), ÂÏ ÄË is still reasonably accurate, while ÂÇÄË is heavily When Æ biased. ½¼¼ ¼ ¼¼ ¼ ¼ ¼¼ ¼¼ ¼ These results show that as an approximation of the generalization error, ÂÏ ÄË is more robust against the misspeciﬁcation of models than ÂÇÄË , which is in good agreement with the theoretical analyses given in Section 3 and Section 4. Learning target function f(x) 8 δ=0 δ=0.04 δ=0.5 6 Table 1: The means and standard deviations of the generalization error for Toy data set. The best method and comparable ones by the t-test at the are described with boldface. signiﬁcance level The value of method (B) for Æ is extremely large but it is not a typo. 4 ± 2 0 −1.5 −1 −0.5 0 0.5 1 1.5 2 Input density functions 1.5 ¼ pt(x) Æ ¼ ½ ¦¼ ¼ px(x) 1 0.5 0 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 1: Learning target function and input density functions. ¼ Æ (A) (B) (C) ¼¼ Æ −3 −3 −3 G−WLS 12 4 3 G−WLS 5 4 ¼ x 10 6 5 ½¼¿. “misspeciﬁed” x 10 G−WLS ¼ ¦¼ ¼ ¿¼¿ ¦ ½ ¦½ ½ ¿ ¾ ¦ ½ ¾¿ ¾ ¾¦¼ ¿ “approximately correct” x 10 6 Æ All values in the table are multiplied by Æ “correctly speciﬁed” ¦¼ ¼ ¾ ¼¦¼ ½¿ ¼¼ Æ ¾ ¼¾ ¦ ¼ ¼ 3 10 8 6 0.8 1.2 1.6 2 0.07 2.4 J−WLS 0.06 0.8 1.2 1.6 2 0.07 2.4 0.8 1.2 1.6 2 0.07 J−WLS 0.06 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 2.4 J−WLS 0.06 0.8 −3 x 10 1.2 1.6 2 2.4 G−OLS 5 0.03 0.8 −3 x 10 1.2 1.6 2 3 1.2 1.6 2 1.6 2.4 2 G−OLS 0.4 4 3 0.8 0.5 G−OLS 5 4 2.4 0.3 0.2 0.1 2 2 0.8 1.2 1.6 2 0.06 2.4 J−OLS 0.8 1.2 1.6 2 0.06 2.4 0.8 1.2 0.06 J−OLS 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.02 0.02 2.4 J−OLS 0.8 1.2 1.6 c 2 2.4 0.03 0.02 0.8 Figure 2: The means and error bars of functions of . 1.2 1.6 c Ï ÄË , 2 Â Ï ÄË 2.4 , 0.8 ÇÄË 1.2 1.6 c , and ÂÇÄË over 2 2.4 ½¼¼ runs as In Table 1, the mean and standard deviation of the generalization error obtained by each method is described. When Æ , the existing method (B) works better than the proposed method (A). Actually, in this case, training input densities that approximately minimize Ï ÄË and ÇÄË were found by ÂÏ ÄË and ÂÇÄË . Therefore, the difference of the errors is caused by the difference of WLS and OLS: WLS generally has larger variance than OLS. Since bias is zero for both WLS and OLS if Æ , OLS would be more accurate than WLS. Although the proposed method (A) is outperformed by the existing method (B), it still works better than the passive learning scheme (C). When Æ and Æ the proposed method (A) gives signiﬁcantly smaller errors than other methods. ¼ ¼ ¼¼ ¼ Overall, we found that for all three cases, the proposed method (A) works reasonably well and outperforms the passive learning scheme (C). On the other hand, the existing method (B) works excellently in the correctly speciﬁed case, although it tends to perform poorly once the correctness of the model is violated. Therefore, the proposed method (A) is found to be robust against the misspeciﬁcation of models and thus it is reliable. Table 2: The means and standard deviations of the test error for DELVE data sets. All values in the table are multiplied by ¿. Bank-8fm Bank-8fh Bank-8nm Bank-8nh (A) ¼ ¿½ ¦ ¼ ¼ ¾ ½¼ ¦ ¼ ¼ ¾ ¦ ½ ¾¼ ¿ ¦ ½ ½½ (B) ¦ ¦ ¦ ¦ (C) ¦ ¦ ¦ ¦ ½¼ ¼ ¼¼ ¼¿ ¼¼ ¾ ¾½ ¼ ¼ ¾ ¾¼ ¼ ¼ Kin-8fm Kin-8fh ½ ¦¼ ¼ ½ ¦¼ ¼ ½ ¼¦¼ ¼ (A) (B) (C) ¾ ½ ¼ ¿ ½ ½¿ ¾ ¿ ½¿ ¿ ½¿ Kin-8nm ¼¦¼ ½ ¿ ¦ ¼ ½¿ ¾ ¦¼ ¾ Kin-8nh ¿ ¦¼ ¼ ¿ ¼¦ ¼ ¼ ¿ ¦¼ ½ ¼ ¾¦¼ ¼ ¼ ¦¼ ¼ ¼ ½¦¼ ¼ (A)/(C) (B)/(C) (C)/(C) 1.2 1.1 1 0.9 Bank−8fm Bank−8fh Bank−8nm Bank−8nh Kin−8fm Kin−8fh Kin−8nm Kin−8nh Figure 3: Mean relative performance of (A) and (B) compared with (C). For each run, the test errors of (A) and (B) are normalized by the test error of (C), and then the values are averaged over runs. Note that the error bars were reasonably small so they were omitted. ½¼¼ Realistic Data Set: Here we use eight practical data sets provided by DELVE [4]: Bank8fm, Bank-8fh, Bank-8nm, Bank-8nh, Kin-8fm, Kin-8fh, Kin-8nm, and Kin-8nh. Each data set includes samples, consisting of -dimensional input and -dimensional output values. For convenience, every attribute is normalized into . ½¾ ¼ ½℄ ½¾ ½ Suppose we are given all input points (i.e., unlabeled samples). Note that output values are unknown. From the pool of unlabeled samples, we choose Ò input points Ü ½¼¼¼ for training and observe the corresponding output values Ý ½¼¼¼. The ½ ½ task is to predict the output values of all unlabeled samples. ½¼¼¼ In this experiment, the test input density independent Gaussian density. Ô ´Üµ and Ø ´¾ ¾ ÅÄ Ô ´Üµ is unknown. Ø µ ÜÔ Ü ¾ ÅÄ So we estimate it using the ¾ ´¾¾ µ¡ ÅÄ where Å Ä are the maximum likelihood estimates of the mean and standard ÅÄ and the basis functions be deviation obtained from all unlabeled samples. Let Ô where Ø ³ ´Üµ ¼ ½ ÜÔ Ü Ø ¾ ¡ ¾ ¼ for ½¾ ¼ are template points randomly chosen from the pool of unlabeled samples. ´µ We select the training input density ÔÜ Ü from the independent Gaussian density with mean Å Ä and standard deviation Å Ä , where ¼ ¼ ¼ ¾ In this simulation, we can not create the training input points in an arbitrary location because we only have samples. Therefore, we ﬁrst create temporary input points following the determined training input density, and then choose the input points from the pool of unlabeled samples that are closest to the temporary input points. For each data set, we repeat this simulation times, by changing the template points Ø ¼ ½ in each run. ½¾ ½¼¼ ½¼¼ The means and standard deviations of the test error over runs are described in Table 2. The proposed method (A) outperforms the existing method (B) for ﬁve data sets, while it is outperformed by (B) for the other three data sets. We conjecture that the model used for learning is almost correct in these three data sets. This result implies that the proposed method (A) is slightly better than the existing method (B). Figure 3 depicts the relative performance of the proposed method (A) and the existing method (B) compared with the passive learning scheme (C). This shows that (A) outperforms (C) for all eight data sets, while (B) is comparable or is outperformed by (C) for ﬁve data sets. Therefore, the proposed method (A) is overall shown to work better than other schemes. 6 Conclusions We argued that active learning is essentially the situation under the covariate shift—the training input density is different from the test input density. When the model used for learning is correctly speciﬁed, the covariate shift does not matter. However, for misspeciﬁed models, we have to explicitly cope with the covariate shift. In this paper, we proposed a new active learning method based on the weighted least-squares learning. The numerical study showed that the existing method works better than the proposed method if model is correctly speciﬁed. However, the existing method tends to perform poorly once the correctness of the model is violated. On the other hand, the proposed method overall worked reasonably well and it consistently outperformed the passive learning scheme. Therefore, the proposed method would be robust against the misspeciﬁcation of models and thus it is reliable. The proposed method can be theoretically justiﬁed if the model is approximately correct in a weak sense. However, it is no longer valid for totally misspeciﬁed models. A natural future direction would be therefore to devise an active learning method which has theoretical guarantee with totally misspeciﬁed models. It is also important to notice that when the model is totally misspeciﬁed, even learning with optimal training input points would not be successful anyway. In such cases, it is of course important to carry out model selection. In active learning research—including the present paper, however, the location of training input points are designed for a single model at hand. That is, the model should have been chosen before performing active learning. Devising a method for simultaneously optimizing models and the location of training input points would be a more important and promising future direction. Acknowledgments: The author would like to thank MEXT (Grant-in-Aid for Young Scientists 17700142) for partial ﬁnancial support. References [1] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artiﬁcial Intelligence Research, 4:129–145, 1996. [2] V. V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972. [3] K. Fukumizu. Statistical active learning in multilayer perceptrons. IEEE Transactions on Neural Networks, 11(1):17–26, 2000. [4] C. E. Rasmussen, R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani. The DELVE manual, 1996. [5] H. Shimodaira. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000. [6] M. Sugiyama. Active learning for misspeciﬁed models. Technical report, Department of Computer Science, Tokyo Institute of Technology, 2005.

6 0.090375677 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

7 0.09028618 136 nips-2005-Noise and the two-thirds power Law

8 0.08738067 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning

9 0.069998704 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

10 0.064235628 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

11 0.062149651 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

12 0.060326144 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions

13 0.056880116 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

14 0.055471387 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

15 0.054345425 181 nips-2005-Spiking Inputs to a Winner-take-all Network

16 0.053507142 50 nips-2005-Convex Neural Networks

17 0.051994953 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

18 0.051018506 179 nips-2005-Sparse Gaussian Processes using Pseudo-inputs

19 0.04835904 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

20 0.048188481 85 nips-2005-Generalization to Unseen Cases

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, -0.042), (2, -0.029), (3, 0.047), (4, 0.086), (5, -0.045), (6, -0.024), (7, -0.108), (8, 0.047), (9, 0.05), (10, -0.059), (11, -0.006), (12, 0.081), (13, 0.069), (14, -0.013), (15, 0.075), (16, -0.097), (17, 0.093), (18, -0.163), (19, 0.006), (20, 0.303), (21, 0.101), (22, 0.018), (23, -0.33), (24, 0.047), (25, -0.024), (26, 0.084), (27, 0.114), (28, -0.021), (29, -0.14), (30, 0.126), (31, -0.058), (32, -0.065), (33, -0.055), (34, 0.125), (35, -0.119), (36, 0.02), (37, 0.124), (38, 0.125), (39, 0.057), (40, 0.111), (41, 0.078), (42, 0.03), (43, -0.067), (44, 0.044), (45, -0.079), (46, 0.022), (47, -0.063), (48, -0.028), (49, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94599044 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

2 0.69804955 17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

3 0.49981812 77 nips-2005-From Lasso regression to Feature vector machine

Author: Fan Li, Yiming Yang, Eric P. Xing

4 0.47586367 192 nips-2005-The Information-Form Data Association Filter

Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun

5 0.42076793 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning

Author: Jorge Silva, Jorge Marques, João Lemos

Abstract: There has been a surge of interest in learning non-linear manifold models to approximate high-dimensional data. Both for computational complexity reasons and for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important because many existing algorithms have quadratic complexity in the number of observations. This paper presents an algorithm for selecting landmarks, based on LASSO regression, which is well known to favor sparse approximations because it uses regularization with an l1 norm. As an added beneﬁt, a continuous manifold parameterization, based on the landmarks, is also found. Experimental results with synthetic and real data illustrate the algorithm. 1

6 0.39150858 136 nips-2005-Noise and the two-thirds power Law

7 0.39056298 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

8 0.36656499 19 nips-2005-Active Learning for Misspecified Models

9 0.33729935 205 nips-2005-Worst-Case Bounds for Gaussian Process Models

10 0.31069353 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

11 0.30826515 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

12 0.26845548 114 nips-2005-Learning Rankings via Convex Hull Separation

13 0.25943345 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

14 0.2473864 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms

15 0.2469427 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models

16 0.23729026 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

17 0.2343294 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions

18 0.22917461 106 nips-2005-Large-scale biophysical parameter estimation in single neurons via constrained linear regression

19 0.22692384 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

20 0.21841104 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.026), (10, 0.023), (27, 0.392), (31, 0.022), (34, 0.054), (39, 0.019), (50, 0.024), (55, 0.023), (57, 0.011), (60, 0.013), (65, 0.01), (69, 0.059), (73, 0.03), (88, 0.115), (91, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96778625 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

Author: Yan Karklin, Michael S. Lewicki

Abstract: Linear implementations of the efﬁcient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive ﬁelds in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been ﬁxed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and ﬁnd that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and ﬁlters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive ﬁelds and collectively span a broad range of spatial scales. Our work uniﬁes several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.

2 0.9545297 87 nips-2005-Goal-Based Imitation as Probabilistic Inference over Graphical Models

Author: Deepak Verma, Rajesh P. Rao

Abstract: Humans are extremely adept at learning new skills by imitating the actions of others. A progression of imitative abilities has been observed in children, ranging from imitation of simple body movements to goalbased imitation based on inferring intent. In this paper, we show that the problem of goal-based imitation can be formulated as one of inferring goals and selecting actions using a learned probabilistic graphical model of the environment. We ﬁrst describe algorithms for planning actions to achieve a goal state using probabilistic inference. We then describe how planning can be used to bootstrap the learning of goal-dependent policies by utilizing feedback from the environment. The resulting graphical model is then shown to be powerful enough to allow goal-based imitation. Using a simple maze navigation task, we illustrate how an agent can infer the goals of an observed teacher and imitate the teacher even when the goals are uncertain and the demonstration is incomplete.

3 0.9449957 185 nips-2005-Subsequence Kernels for Relation Extraction

Author: Raymond J. Mooney, Razvan C. Bunescu

Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1

same-paper 4 0.86971319 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

5 0.68049258 109 nips-2005-Learning Cue-Invariant Visual Responses

Author: Jarmo Hurri

Abstract: Multiple visual cues are used by the visual system to analyze a scene; achromatic cues include luminance, texture, contrast and motion. Singlecell recordings have shown that the mammalian visual cortex contains neurons that respond similarly to scene structure (e.g., orientation of a boundary), regardless of the cue type conveying this information. This paper shows that cue-invariant response properties of simple- and complex-type cells can be learned from natural image data in an unsupervised manner. In order to do this, we also extend a previous conceptual model of cue invariance so that it can be applied to model simple- and complex-cell responses. Our results relate cue-invariant response properties to natural image statistics, thereby showing how the statistical modeling approach can be used to model processing beyond the elemental response properties visual neurons. This work also demonstrates how to learn, from natural image data, more sophisticated feature detectors than those based on changes in mean luminance, thereby paving the way for new data-driven approaches to image processing and computer vision. 1

6 0.61444354 36 nips-2005-Bayesian models of human action understanding

7 0.60960859 100 nips-2005-Interpolating between types and tokens by estimating power-law generators

8 0.58424193 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

9 0.57475626 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

10 0.57356888 203 nips-2005-Visual Encoding with Jittering Eyes

11 0.56584728 48 nips-2005-Context as Filtering

12 0.55887043 158 nips-2005-Products of ``Edge-perts

13 0.54400051 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

14 0.54291487 152 nips-2005-Phase Synchrony Rate for the Recognition of Motor Imagery in Brain-Computer Interface

15 0.54282731 169 nips-2005-Saliency Based on Information Maximization

16 0.53421819 35 nips-2005-Bayesian model learning in human visual perception

17 0.53350681 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception

18 0.53137207 153 nips-2005-Policy-Gradient Methods for Planning

19 0.52040386 52 nips-2005-Correlated Topic Models

20 0.51275074 45 nips-2005-Conditional Visual Tracking in Kernel Space