nips nips2005 nips2005-161 nips2005-161-reference knowledge-graph by maker-knowledge-mining

161 nips-2005-Radial Basis Function Network for Multi-task Learning

Source: pdf

Author: Xuejun Liao, Lawrence Carin

Abstract: We extend radial basis function (RBF) networks to the scenario in which multiple correlated tasks are learned simultaneously, and present the corresponding learning algorithms. We develop the algorithms for learning the network structure, in either a supervised or unsupervised manner. Training data may also be actively selected to improve the network’s generalization to test data. Experimental results based on real data demonstrate the advantage of the proposed algorithms and support our conclusions. 1

reference text

[1] R. Caruana. (1997) Multitask learning. Machine Learning, 28, p. 41-75, 1997.

[2] B. Bakker and T. Heskes (2003). Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4: 83-99, 2003

[3] T. Evgeniou, C. A. Micchelli, and M. Pontil (2005). Learning Multiple Tasks with Kernel Methods. Journal of Machine Learning Research, 6: 615637, 2005

[4] Powell M. (1987), Radial basis functions for multivariable interpolation : A review, J.C. Mason and M.G. Cox, eds, Algorithms for Approximation, pp.143-167.

[5] Chen, F. Cowan, and P. Grant (1991), Orthogonal least squares learning algorithm for radial basis function networks, IEEE Transactions on Neural Networks, Vol. 2, No. 2, 302-309, 1991

[6] Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995). Active learning with statistical models. Advances in Neural Information Processing Systems, 7, 705-712.

[7] V. Fedorov (1972), Theory of Optimal Experiments, Academic Press, 1972

[8] M. Stone (1974), Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Series B, 36, pp. 111-147, 1974. Appendix Proof of Theorem 1:. Let φnew = [φ, φN +1 ]T . By (3), the A matrices corresponding to φnew are φik Ak ck Anew = Jk (A-1) φT φN +1 + ρ I(N +2)×(N +2) = k ik i=1 ik cT dk φN +1 k ik new where ck and dk are as in (6). By the conditions of the theorem, the matrices Ak and Ak are all non-degenerate. Using the block matrix inversion formula [7] we get −1 −1 A−1 + A−1 ck qk cT A−1 −A−1 ck qk k k k k (A-2) (Anew )−1= k k −1 −1 qk −qk cT A−1 k k new where qk is as in (6). By (3), the weights wk corresponding to [φT , φN +1 ]T are Jk i=1 yik φik Jk N +1 i=1 yik φik Jk N +1 . Hence, i=1 yik φik new new wk = (Anew )−1 k = −1 wk + A−1 ck qk gk k −1 −qk gk new (φnew )T wk with gk = cT wk − k ik K Jk N +1 −1 φik gk qk , which is put into (4) to get e(φ = k=1 i=1 K Jk −1 N +1 −1 2 T T gk qk k=1 i=1 yik − yik φ ik wk − yik φik Ak ck − φik Jk N +1 2 −1 qk , where in arriving the last equality we have i=1 yik φik cT wk − Jk yik φN +1 . The theorem is proved. k ik i=1 (A-3) = φT wk + φT A−1 ck ik ik k 2 new yik − yik (φnew )T wk ik K = e(φ) − k=1 cT wk k used (3) and (4) and gk − = − = Proof of Theorem 2: The proof applies to k = 1, · · · , K. For any given k, deﬁne Φ = φ(x1k ), . . . , φ(xJk k ) , Φ = φ(xJk +1,k ), . . . , φ(xJk +Jk ,k ) , yk = [y1k , . . . , yJk k ]T , yk = ∼ ∼ ∼ [yJk +1,k , . . . , yJk +Jk ,k ]T , fk = [f (x1k ), . . . , f (xJk k )]T , fk = [fk (x1k ), . . . , fk (xJk k )]T , T and Ak = ρI + Φk Φk . By (1), (3), and the conditions of the theorem, fk = ΦT Ak + k Φk ΦT k −1 (a) (Φk yk +Φyk ) = ΦT A−1− ΦT A−1 Φk +I−I I+ΦT A−1 Φk k k k k k k (b) −1 −1 ΦT A−1 Φk yk+ k k −1 ∼ fk + I + ΦT A−1 Φk yk + Φk yk = I + ΦT A−1 Φk Φk yk = I + ΦT A−1 Φk k k k k k k −1 −1 ∼ ΦT A−1 Φk ΦT A−1 Φk + I − I yk = yk + I + ΦT A−1 Φk fk − yk , where equak k k k k k tion (a) is due to the Sherman-Morrison-Woodbury formula and equation (b) results because −1 ∼ ∼ fk = ΦT A−1 Φk yk . Hence, fk − yk = I + ΦT A−1 Φk fk − yk , which gives k k k k Jk i=1 ∼ (yik − fk (xik ))2 = (fk − yk )T (fk − yk ) = fk − yk where Γk = I + 2 ΦT A−1 Φk k k = I+ ΦT (ρ I k + T ∼ Γ−1 fk − yk k (A-4) T 2 Φk Φk )−1 Φk . By construction, Γk has all its eigenvalues no less than 1, i.e., Γk = ET diag[λ1k , · · · , λJk k ]Ek with k ET Ek = I and λ1k , · · · , λJk k ≥ 1, which makes the ﬁrst, second, and last inequality in (7) hold. k Using this expansion of Γk in (A-4) we get Jk i=1 ∼ (fk (xik ) − yik )2 = fk − yk T T −1 −1 ∼ ET diag[σ1k , . . . , σJk k ] fk − yk k 2 ∼ −1 ∼ ∼ Jk (A-5) ≤ fk − yk ET λmin,k I Ek fk − yk = λ−1 k min,k i=1(fk (xik ) − yik ) where the inequality results because λmin,k = min(λ1,k , · · · , λJk ,k ). From (A-5) follows the fourth inequality in (7). The third inequality in (7) can be proven in in a similar way.