nips nips2009 nips2009-184 knowledge-graph by maker-knowledge-mining

184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

Source: pdf

Author: Wenming Zheng, Zhouchen Lin

Abstract: The method of common spatio-spectral patterns (CSSPs) is an extension of common spatial patterns (CSPs) by utilizing the technique of delay embedding to alleviate the adverse effects of noises and artifacts on the electroencephalogram (EEG) classiﬁcation. Although the CSSPs method has shown to be more powerful than the CSPs method in the EEG classiﬁcation, this method is only suitable for two-class EEG classiﬁcation problems. In this paper, we generalize the two-class CSSPs method to multi-class cases. To this end, we ﬁrst develop a novel theory of multi-class Bayes error estimation and then present the multi-class CSSPs (MCSSPs) method based on this Bayes error theoretical framework. By minimizing the estimated closed-form Bayes error, we obtain the optimal spatio-spectral ﬁlters of MCSSPs. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on the BCI competition 2005 data set. The experimental results show that our method signiﬁcantly outperforms the previous multi-class CSPs (MCSPs) methods in the EEG classiﬁcation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract The method of common spatio-spectral patterns (CSSPs) is an extension of common spatial patterns (CSPs) by utilizing the technique of delay embedding to alleviate the adverse effects of noises and artifacts on the electroencephalogram (EEG) classiﬁcation. [sent-8, score-0.157]

2 Although the CSSPs method has shown to be more powerful than the CSPs method in the EEG classiﬁcation, this method is only suitable for two-class EEG classiﬁcation problems. [sent-9, score-0.084]

3 In this paper, we generalize the two-class CSSPs method to multi-class cases. [sent-10, score-0.045]

4 To this end, we ﬁrst develop a novel theory of multi-class Bayes error estimation and then present the multi-class CSSPs (MCSSPs) method based on this Bayes error theoretical framework. [sent-11, score-0.109]

5 To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on the BCI competition 2005 data set. [sent-13, score-0.067]

6 The experimental results show that our method signiﬁcantly outperforms the previous multi-class CSPs (MCSPs) methods in the EEG classiﬁcation. [sent-14, score-0.028]

7 1 Introduction The development of non-invasive brain computer interface (BCI) using the electroencephalogram (EEG) signal has become a very hot research topic in the BCI community [1]. [sent-15, score-0.068]

8 During the last several years, a large number of signal processing and machine learning methods have been proposed for EEG classiﬁcation [6]. [sent-16, score-0.04]

9 It is challenging to extract the discriminant features from the EEG signal for EEG classiﬁcation. [sent-17, score-0.1]

10 This is because in most cases the EEG data are centered at zero and thus many traditional discriminant feature extraction methods, e. [sent-18, score-0.149]

11 , Fisher’s linear discriminant analysis (FLDA) [7], cannot be successfully used. [sent-20, score-0.06]

12 Among the various EEG feature extraction methods, the common spatial patterns (CSPs) method [2] is one of the most popular. [sent-21, score-0.194]

13 Given two classes of EEG signal, the basic idea of CSPs is to ﬁnd some projection directions such that the projections of the EEG signal onto these directions will maximize the variance of one class and simultaneously minimize the variance of the other class. [sent-22, score-0.096]

14 Although CSPs have achieved great success in EEG classiﬁcation, this method only utilizes the spatial information of the EEG signal. [sent-23, score-0.081]

15 To utilize both the spatial and the temporal information of the EEG signal for classiﬁcation, Lemm et al. [sent-24, score-0.093]

16 The experiments in [3] showed that the CSSPs method outperforms the CSPs method. [sent-26, score-0.028]

17 A multi-class extension of the two-class CSPs method (MCSPs) was proposed by Dornhege et al. [sent-27, score-0.028]

18 [4] who adopted a joint approximate diagonalization (JAD) technique to ﬁnd the optimal spatial ﬁlters. [sent-28, score-0.053]

19 Grosse-Wentrup and Buss [5] recently pointed out that the MCSPs method has two major 1 drawbacks. [sent-29, score-0.028]

20 The ﬁrst drawback is that this method lacks solid theoretical foundation with respect to its classiﬁcation error. [sent-30, score-0.028]

21 The second one is that the selection of the optimal spatial ﬁlters of MCSPs is based on heuristics. [sent-31, score-0.053]

22 To overcome these drawbacks, they proposed a method based on mutual information to select the optimal spatial ﬁlters from the original MCSPs result. [sent-32, score-0.081]

23 In this paper, we generalize the two-class CSSPs method to multi-class cases, hereafter called the MCSSPs method. [sent-34, score-0.045]

24 However, we do not adopt the same JAD technique used in the MCSPs method to derive our MCSSPs method. [sent-35, score-0.051]

25 Instead, we derive our MCSSPs method directly based on the Bayes error estimation, and thus provide a solid theoretic foundation. [sent-36, score-0.083]

26 To this end, we ﬁrst develop a novel theory of multi-class Bayes error estimation, which has a closed-form solution to ﬁnd the optimal discriminant vectors. [sent-37, score-0.091]

27 Based on this new theoretic framework, we propose our MCSSPs method for EEG feature extraction and recognition. [sent-38, score-0.141]

28 2 Brief Review of CSPs and CSSPs Let Xt = {xt ∈ IRd |j = 1, · · · , mi,t } (t = 1, · · · , ni ; i = 1, · · · , c) denote the EEG data set from i i,j the tth trial of the ith class, where d, c, ni , and mi,t denote the number of channels (i. [sent-39, score-0.25]

29 , recording electrodes), the number of classes, the number of trials of the ith class, and the number of samples (i. [sent-41, score-0.086]

30 , recording points) in the tth trial of the ith class, respectively. [sent-43, score-0.203]

31 Then the main task of EEG feature extraction is to ﬁnd a linear transformation t W ∈ IRd×k (k < d), such that for ﬁnite training data using the projected vectors yi,j = WT xt to i,j t classify the EEG signal may lead to better classiﬁcation accuracy than using xi,j . [sent-47, score-0.269]

32 1 The CSPs Method For the two-class EEG classiﬁcation problem, the basic idea of CSPs is to ﬁnd a transformation matrix W that simultaneously diagonalizes both class covariance matrices Σ1 and Σ2 [2], i. [sent-49, score-0.108]

33 The spatial ﬁlters can be chosen λ as the columns of W associated with the maximal or minimal ratio of λ1,j (j = 1, · · · , d). [sent-52, score-0.053]

34 [6] proved that the CSPs method can be formulated as the following optimization problem: ω T Σ1 ω ω T Σ2 ω , , (2) ω ω T Σ2 ω ω T Σ1 ω and this optimization problem boils down to solving the following generalized eigenvalue decomposition problem: ω = arg max max Σ1 ω = λΣ2 ω. [sent-54, score-0.188]

35 (3) Let ω1 , · · · , ωd and λ1 , · · · , λd be the eigenvectors and the corresponding eigenvalues of equation (3), then the spatial ﬁlters ωi1 , · · · , ωik can be chosen from the eigenvectors ω1 , · · · , ωd associated with the largest and the smallest eigenvalues. [sent-55, score-0.109]

36 2 (4) The CSSPs Method The CSSPs method is an extension of CSPs by concatenating the original EEG data and a timedelayed one to form a longer vector sample, and then performing EEG feature extraction, which is similar to the CSPs method, from these padded samples. [sent-58, score-0.114]

37 2 (5) Then, equation (4) can be re-written as the following: T T ˆt Yi = W(0) Xt + W(τ ) δ τ (Xt ), i i (6) where W(0) and W(τ ) are the transformation matrices on the EEG data Xt and δ τ (Xt ), respectively. [sent-65, score-0.071]

38 To express the above equation in a similar form as CSPs, we deﬁne Xt i δ (Xt ) i ˆ Xt = i . [sent-66, score-0.028]

39 i i (8) t 3 MCSSPs Based on Multi-class Bayes Error Estimation In this section, we extend the CSSPs method to the multi-class case. [sent-68, score-0.028]

40 To begin with, we develop a novel theory of multi-class Bayes error estimation. [sent-69, score-0.031]

41 Then we present our MCSSPs method based on this Bayes error framework. [sent-70, score-0.059]

42 1 Multi-class Bayes Error Estimation It is well known that the Bayes error regarding classes i and j can be expressed as [7]: ε = min(Pi pi (x), Pj pj (x))dx, (9) where Pi and pi (x) are the apriori probability and the probability density function of the ith class, Pi Pj pi (x)pj (x)dx. [sent-72, score-0.835]

43 Let εij = √ min(a, b) ≤ ab, ∀a, b ≥ 0, (10) and the assumption pi (x) = N (0, Σi ), we obtain the following upper bound of the Bayes error: ε ≤ εij = 1 Pi Pj exp − ln 2 ¯ |Σij | |Σi ||Σj | = Pi Pj −1 2 ¯ |Σij | |Σi ||Σj | , (11) 1 ¯ where Σij = 2 (Σi + Σj ). [sent-74, score-0.177]

44 (13) u 4 u u2 − v 2 For the c classes problem, the upper bound of the Bayes error in the reduced feature space can be c−1 c estimated as ε ≤ i=1 j=i+1 εij [8]. [sent-79, score-0.077]

45 (14) a 2 b Recursively applying the following inequality error bound in equation (14), we have c−1 ε c ≤ Pi Pj − i=1 j=i+1 ¯ Let Σ = c i=1 + c i=1 1 8 c 2 d ≥ a+c b+d 2 , ∀a, c ≥ 0; b, d > 0 to the 2 5 c T 4 j=1 (Pi Pj ) |ω ∆Σij ω| c c T ¯ i=1 j=1 Pi Pj ω Σij ω . [sent-81, score-0.059]

46 (20) MCSSPs Based on Multi-class Bayes Error Estimation ˆ Let Σi (k = 1, · · · , c) denote the new class covariance matrices computed via equation (8). [sent-92, score-0.113]

47 Then to minimize the Bayes error, we should minimize its upper bound, which boils down to maximizing the following discriminant criterion c ˆ ˆ ¯ |ω T (Σi − Σ)ω| . [sent-93, score-0.092]

48 (21) J(ω) = i=1 ˆ ¯ ω T Σω ˆ ¯ where Σ is the global covariance matrix. [sent-94, score-0.045]

49 Based on this criterion, we deﬁne the k optimal spatial ﬁlters of MCSSPs as follows: c ˆ ˆ ¯ |ω T (Σi − Σ)ω| , ω1 = arg max i=1 ˆ ¯ ω ω T Σω ··· c ˆ T ˆ ¯ i=1 |ω (Σi − Σ)ω| ωk = arg max . [sent-95, score-0.141]

50 Then solving the optimization problem of equation (22) is equivalent to solving the following optimization problem ˆ c ˆ |αT (Σi − I)α| , α1 = arg max i=1 αT α α ··· c T ˆ ˆ i=1 |α (Σi − I)α| αk = arg max , (23) Tα α αT Uk−1 = 0 4 where Uk−1 = [α1 , · · · , αk−1 ] and I is the identity matrix. [sent-97, score-0.116]

51 Suppose that si ∈ {+1, −1} denotes ˆ ˆ the positive or negative sign of αT (Σi − I)α. [sent-98, score-0.045]

52 So equation (23) can be expressed as ˆ c ˆ αT i=1 si (Σi − I)α α1 = arg max , Tα α α ··· ˆ c ˆ αT i=1 si (Σi − I)α . [sent-100, score-0.162]

53 αk = arg max Tα T α α Uk−1 = 0 (24) (25) ˆ c ˆ Let T(s) = i=1 si (Σi − I), where s = [s1 , s2 , · · · , sc ]T and si ∈ {+1, −1}. [sent-101, score-0.134]

54 Then the ﬁrst vector α1 deﬁned in equation (25) is the principal eigenvector associated with the largest eigenvalue of the matrix T(s). [sent-102, score-0.197]

55 Then αk+1 deﬁned in (25) is the principal eigenvector corresponding to the largest eigenvalue of the following matrix (Id − Qk QT )T(s)(Id − Qk QT ). [sent-108, score-0.169]

56 The above two theorems are crucial to design our fast algorithm for solving MCSSPs: Theorem 1 makes it possible to use the power method to solve MCSSPs, while Theorem 2 makes it possible to update Qk+1 from Qk by adding a single column. [sent-113, score-0.068]

57 Moreover, it is notable that k Id − Qk QT = k (Id − qi qT ) = (Id − Qk−1 QT )(Id − qk qT ), i k−1 k (26) i=1 where qi is the i-th column of Qk . [sent-114, score-0.851]

58 Then we have that c max α =1 ˆ ˆ |αT (Σi − I)α| = max max αT T(s)α. [sent-117, score-0.06]

59 We present the pseudo-code of our MCSSPs method using the full search on S in Algorithm 1. [sent-119, score-0.045]

60 However, if c is a bit large, we may adopt a similar approach as proposed in [10], which is based on a greedy search, to ﬁnd the suboptimal solution. [sent-120, score-0.042]

61 The pseudo-code based on the greedy search is given in Algorithm 2. [sent-121, score-0.036]

62 4 EEG Feature Extraction Based on the MCSSPs Let Xt be the EEG sample points from the tth trial under the ith condition (i. [sent-122, score-0.183]

63 Let ωj i Xt i ˆ be the jth optimal spatial ﬁlter of the MCSSPs method. [sent-125, score-0.071]

64 Construct the new data Xt = , i δ τ (Xt ) i and let T ˆ ˆ i,j pt = ωj Xt (28) i 5 Algorithm 1: The MCSSPs Algorithm Based on the Full Search Strategy Input: • Input data matrix X and the class label vector l. [sent-126, score-0.038]

65 Compute the average covariance matrices Σi (i = 1, · · · , c) and Σ; 1 1 ˆ ˆ ˆ ˆ − 2 = UΛ− 2 UT and Σ−1 = UΛ−1 UT ; ¯ ¯ ¯ ¯ 2. [sent-128, score-0.065]

66 For j=1 to 2c • Compute T(si ); • Solve the principal eigenvector of T(si )α(j) = λ(j) α(j) via the power iteration method; 2. [sent-132, score-0.116]

67 Select the eigenvector α with the largest eigenvalue maxj=1,···,2c {λ(j) }; 3. [sent-133, score-0.134]

68 If i = 1, then qi ← α, qi ← qi / qi , and Q1 ← qi ; else qi ← α − Qi−1 (QT α), qi ← qi / qi , and Qi ← (Qi−1 qi ); i−1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 4. [sent-134, score-3.27]

69 Compute ∆Σp ← ∆Σp −(∆Σp qi )qT −qi (qT ∆Σp )+qi (qT ∆Σp qi )qT (p = 1, · · · , c); i i i i 1 ˆ −2 ¯ Output: ωi = Σ αi , i = 1, · · · , k. [sent-135, score-0.65]

70 Compute the average covariance matrices Σi (i = 1, · · · , c) and Σ; 1 1 ˆ ˆ ˆ ˆ − 2 = UΛ− 2 UT and Σ−1 = UΛ−1 UT ; ¯ ¯ ¯ ¯ 2. [sent-138, score-0.065]

71 Solve the principal eigenvector of T(s)α = λα associated with the largest absolute eigenvalue |λ| via the power iteration method. [sent-142, score-0.207]

72 Set λ0 ← |λ|; While s = s1 , Do (a) Set s1 ← s; (b) For j = 1, 2, · · · , c, Do • Set sj ← −sj , where sj denotes the jth element of s. [sent-143, score-0.096]

73 If i = 1, then qi ← αi , qi ← qi / qi , and Q1 ← qi ; else qi ← αi − Qi−1 (QT αi ), qi ← qi / qi , and Qi ← (Qi−1 qi ); i−1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 4. [sent-145, score-3.27]

74 Compute ∆Σp ← ∆Σp −(∆Σp qi )qT −qi (qT ∆Σp )+qi (qT ∆Σp qi )qT (p = 1, · · · , c); i i i i 1 ˆ −2 ¯ Output: ωi = Σ αi , i = 1, · · · , k. [sent-146, score-0.65]

75 Then the covariance of the ˆ i,j elements in the projections pt can be expressed as t vi,j T ˆ T ˆ = var(ωj Xt ) = ωj Σt ωj . [sent-148, score-0.082]

76 i i (29) ˆ where Σt denotes the covariance matrix of the EEG data in the tth trial of the ith class. [sent-149, score-0.228]

77 i t For all the k spatio-spectral ﬁlters ω1 , · · · , ωk , we obtain the k features vi,j (j = 1, · · · , k) from the t t t T tth trial of EEG data. [sent-150, score-0.145]

78 Now let vi = [vi,1 , · · · , vi,k ] be the feature vector associated with the tth trial of the ith class. [sent-151, score-0.212]

79 Similar to the method used in [2], the following log-transformation form is used as the ﬁnal feature vector of the EEG signal: fit = log t vi t k vi,k , where the log function is applied to each element of the vector independently. [sent-152, score-0.083]

80 , we ﬁrst construct the new data Z = , and then adopt the above method to δ τ (Z) z extract the corresponding discriminant feature vector f , where f z = log vz z k vk z z z T ˆ , vz = [v1 , · · · , vk ]T , and vj = ωj Σz ωj , (31) ˆ ˆ in which Σz denotes the covariance matrix of Z. [sent-156, score-0.291]

81 After obtaining the discriminant feature vectors fit (i = 1, · · · , c; t = 1, 2 · · · , ni ) and f z , we can classify the unknown EEG data into one of the c classes by using a classiﬁer, e. [sent-157, score-0.157]

82 The data set used here is from “BCI competition 2005” - data set IIIa [11]. [sent-161, score-0.039]

83 This data set consists of recordings from three subjects (k3b, k6b, and l1b), which performed four different motor imagery tasks (left/right hand, one foot, or tongue) according to a cue. [sent-162, score-0.042]

84 During the experiments, the EEG signal is recorded in 60 channels, using the left mastoid as reference and the right mastoid as ground. [sent-163, score-0.108]

85 Each trial lasted for 7 s, with the motor imagery performed during the last 4 s of each trial. [sent-165, score-0.125]

86 Similar to the method in [5], we discard the four trials of subject k6b with missing data. [sent-168, score-0.056]

87 For each trial of the EEG raw data, we only use part of the sample points, i. [sent-169, score-0.083]

88 Table 1 shows the average classiﬁcation rates (%) versus the standard deviations (%) of the three methods2 , while ﬁgure 1 shows the average recognition rates of our MCSSPs method with different choices of the delayed time τ . [sent-182, score-0.115]

89 7 Table 1: Comparison of the classiﬁcation rates (%) versus standard deviations (%) between MCSPs and MCSSPs. [sent-185, score-0.032]

90 16) 90 Classification rates (%) 85 80 k3b k6b l1b 75 70 65 60 55 50 1 2 3 4 5 τ 6 7 8 9 10 Figure 1: The classiﬁcation rates (%) of our MCSSPs method with different choices of τ . [sent-204, score-0.092]

91 6 Conclusions In this paper, we extended the two-class CSSPs method to the multi-class cases via the Bayes error estimation. [sent-205, score-0.059]

92 We ﬁrst proposed a novel theory on multi-class Bayes error estimation, which has a closed-form solution to ﬁnd the optimal discriminant vectors for feature extraction. [sent-206, score-0.12]

93 Then we applied the multi-class Bayes error estimation theory to generalize the two-class CSSPs method to multiclass cases. [sent-207, score-0.117]

94 The experiments on the data set IIIa from BCI competition 2005 have shown that our MCSSPs method is superior to the MCSPS methods. [sent-208, score-0.067]

95 , preprocessing the EEG signal and adopting a more advanced classiﬁer, even higher classiﬁcation rates are possible. [sent-211, score-0.072]

96 M¨ ller (2002) Classifying single trial EEG: towards brain computer u interfacing. [sent-218, score-0.113]

97 Pfurtscheller (2000) Optimal spatial ﬁltering of single trial EEG during imaged hand movement. [sent-230, score-0.136]

98 M¨ ller (2005) Spatio-spectral ﬁlters for improved classiﬁcation u of single trial EEG. [sent-238, score-0.113]

99 M¨ ller (2004) Boosting bit rates in noninvasive EEG singleu trial classiﬁcations by feature combination and multiclass paradigms. [sent-246, score-0.196]

100 Birbaumer (2006) The BCI competition III: Validating alternative approaches to actual BCI problems. [sent-297, score-0.039]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('eeg', 0.509), ('mcssps', 0.36), ('qi', 0.325), ('csps', 0.255), ('cssps', 0.24), ('mcsps', 0.24), ('pj', 0.218), ('qk', 0.201), ('qt', 0.198), ('pi', 0.177), ('xt', 0.117), ('ij', 0.107), ('id', 0.086), ('bci', 0.085), ('trial', 0.083), ('bayes', 0.082), ('lters', 0.064), ('tth', 0.062), ('eigenvector', 0.06), ('extraction', 0.06), ('discriminant', 0.06), ('uk', 0.055), ('spatial', 0.053), ('jad', 0.051), ('classi', 0.05), ('eigenvalue', 0.046), ('ut', 0.046), ('si', 0.045), ('covariance', 0.045), ('zheng', 0.043), ('signal', 0.04), ('competition', 0.039), ('sj', 0.039), ('blankertz', 0.039), ('curio', 0.039), ('ith', 0.038), ('principal', 0.035), ('buss', 0.034), ('iiia', 0.034), ('ird', 0.034), ('mastoid', 0.034), ('padded', 0.034), ('pfurtscheller', 0.034), ('vz', 0.034), ('wenming', 0.034), ('rates', 0.032), ('boils', 0.032), ('qr', 0.032), ('error', 0.031), ('ller', 0.03), ('biomedical', 0.03), ('parra', 0.03), ('lemm', 0.03), ('rehabilitation', 0.03), ('feature', 0.029), ('conduct', 0.028), ('equation', 0.028), ('method', 0.028), ('largest', 0.028), ('trials', 0.028), ('electroencephalogram', 0.028), ('china', 0.027), ('fit', 0.026), ('dornhege', 0.026), ('ni', 0.025), ('arg', 0.024), ('imagery', 0.024), ('theoretic', 0.024), ('patterns', 0.024), ('adopt', 0.023), ('concatenating', 0.023), ('delayed', 0.023), ('rk', 0.023), ('lter', 0.023), ('transformation', 0.023), ('multiclass', 0.022), ('power', 0.021), ('matrices', 0.02), ('recording', 0.02), ('else', 0.02), ('class', 0.02), ('max', 0.02), ('estimation', 0.019), ('theorems', 0.019), ('vk', 0.019), ('greedy', 0.019), ('projections', 0.019), ('decomposition', 0.018), ('er', 0.018), ('transactions', 0.018), ('hz', 0.018), ('jth', 0.018), ('pt', 0.018), ('motor', 0.018), ('cation', 0.018), ('classes', 0.017), ('channels', 0.017), ('absolute', 0.017), ('generalize', 0.017), ('search', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

Author: Wenming Zheng, Zhouchen Lin

2 0.18275015 237 nips-2009-Subject independent EEG-based BCI decoding

Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller

Abstract: In the quest to make Brain Computer Interfacing (BCI) more usable, dry electrodes have emerged that get rid of the initial 30 minutes required for placing an electrode cap. Another time consuming step is the required individualized adaptation to the BCI user, which involves another 30 minutes calibration for assessing a subject’s brain signature. In this paper we aim to also remove this calibration proceedure from BCI setup time by means of machine learning. In particular, we harvest a large database of EEG BCI motor imagination recordings (83 subjects) for constructing a library of subject-speciﬁc spatio-temporal ﬁlters and derive a subject independent BCI classiﬁer. Our ofﬂine results indicate that BCI-na¨ve ı users could start real-time BCI use with no prior calibration at only a very moderate performance loss.

3 0.17885807 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

Author: Sennay Ghebreab, Steven Scholte, Victor Lamme, Arnold Smeulders

Abstract: Contrast statistics of the majority of natural images conform to a Weibull distribution. This property of natural images may facilitate efficient and very rapid extraction of a scene's visual gist. Here we investigated whether a neural response model based on the Wei bull contrast distribution captures visual information that humans use to rapidly identify natural scenes. In a learning phase, we measured EEG activity of 32 subjects viewing brief flashes of 700 natural scenes. From these neural measurements and the contrast statistics of the natural image stimuli, we derived an across subject Wei bull response model. We used this model to predict the EEG responses to 100 new natural scenes and estimated which scene the subject viewed by finding the best match between the model predictions and the observed EEG responses. In almost 90 percent of the cases our model accurately predicted the observed scene. Moreover, in most failed cases, the scene mistaken for the observed scene was visually similar to the observed scene itself. Similar results were obtained in a separate experiment in which 16 other subjects where presented with artificial occlusion models of natural images. Together, these results suggest that Weibull contrast statistics of natural images contain a considerable amount of visual gist information to warrant rapid image identification.

4 0.1477565 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

Author: Bryan Conroy, Ben Singer, James Haxby, Peter J. Ramadge

Abstract: The inter-subject alignment of functional MRI (fMRI) data is important for improving the statistical power of fMRI group analyses. In contrast to existing anatomically-based methods, we propose a novel multi-subject algorithm that derives a functional correspondence by aligning spatial patterns of functional connectivity across a set of subjects. We test our method on fMRI data collected during a movie viewing experiment. By cross-validating the results of our algorithm, we show that the correspondence successfully generalizes to a secondary movie dataset not used to derive the alignment. 1

5 0.12540264 166 nips-2009-Noisy Generalized Binary Search

Author: Robert Nowak

Abstract: This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noise-tolerant versions of GBS are suboptimal in terms of query complexity. This paper presents an optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions. 1

6 0.11301954 246 nips-2009-Time-Varying Dynamic Bayesian Networks

7 0.098374844 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

8 0.094949484 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

9 0.094682567 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

10 0.077094898 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

11 0.069909945 98 nips-2009-From PAC-Bayes Bounds to KL Regularization

12 0.068414055 69 nips-2009-Discrete MDL Predicts in Total Variation

13 0.0614824 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

14 0.059603885 154 nips-2009-Modeling the spacing effect in sequential category learning

15 0.058083329 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

16 0.058078613 178 nips-2009-On Stochastic and Worst-case Models for Investing

17 0.055365879 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

18 0.052917775 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

19 0.049707435 47 nips-2009-Boosting with Spatial Regularization

20 0.048685204 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.145), (1, 0.041), (2, 0.02), (3, 0.037), (4, 0.055), (5, 0.083), (6, 0.012), (7, -0.054), (8, -0.057), (9, -0.055), (10, -0.007), (11, 0.065), (12, 0.025), (13, 0.042), (14, 0.053), (15, -0.062), (16, 0.061), (17, 0.033), (18, -0.196), (19, 0.031), (20, 0.364), (21, -0.018), (22, -0.109), (23, -0.074), (24, 0.069), (25, 0.054), (26, -0.109), (27, 0.033), (28, -0.152), (29, 0.082), (30, 0.073), (31, 0.031), (32, 0.149), (33, -0.117), (34, 0.063), (35, 0.139), (36, -0.071), (37, 0.061), (38, -0.051), (39, 0.128), (40, 0.057), (41, -0.048), (42, 0.029), (43, -0.054), (44, 0.007), (45, -0.068), (46, -0.016), (47, 0.013), (48, -0.063), (49, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95086282 184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

Author: Wenming Zheng, Zhouchen Lin

2 0.60925138 237 nips-2009-Subject independent EEG-based BCI decoding

Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller

3 0.58713239 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

Author: Bryan Conroy, Ben Singer, James Haxby, Peter J. Ramadge

4 0.49235263 166 nips-2009-Noisy Generalized Binary Search

Author: Robert Nowak

5 0.47053328 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

Author: Sennay Ghebreab, Steven Scholte, Victor Lamme, Arnold Smeulders

6 0.38814601 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

7 0.38104448 7 nips-2009-A Data-Driven Approach to Modeling Choice

8 0.34610057 69 nips-2009-Discrete MDL Predicts in Total Variation

9 0.33548138 246 nips-2009-Time-Varying Dynamic Bayesian Networks

10 0.33118355 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

11 0.32937106 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

12 0.31047553 81 nips-2009-Ensemble Nystrom Method

13 0.29199317 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

14 0.27203509 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

15 0.2537379 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

16 0.2525214 182 nips-2009-Optimal Scoring for Unsupervised Learning

17 0.24994405 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

18 0.24665546 127 nips-2009-Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

19 0.23099223 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

20 0.22939306 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.025), (24, 0.038), (25, 0.046), (35, 0.041), (36, 0.109), (39, 0.033), (58, 0.084), (69, 0.3), (71, 0.051), (77, 0.035), (81, 0.014), (86, 0.083), (91, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.75691354 105 nips-2009-Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction

Author: Grzegorz Swirszcz, Naoki Abe, Aurelie C. Lozano

Abstract: We consider the problem of variable group selection for least squares regression, namely, that of selecting groups of variables for best regression performance, leveraging and adhering to a natural grouping structure within the explanatory variables. We show that this problem can be efﬁciently addressed by using a certain greedy style algorithm. More precisely, we propose the Group Orthogonal Matching Pursuit algorithm (Group-OMP), which extends the standard OMP procedure (also referred to as “forward greedy feature selection algorithm” for least squares regression) to perform stage-wise group variable selection. We prove that under certain conditions Group-OMP can identify the correct (groups of) variables. We also provide an upperbound on the l∞ norm of the difference between the estimated regression coefﬁcients and the true coefﬁcients. Experimental results on simulated and real world datasets indicate that Group-OMP compares favorably to Group Lasso, OMP and Lasso, both in terms of variable selection and prediction accuracy. 1

same-paper 2 0.75400382 184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

Author: Wenming Zheng, Zhouchen Lin

3 0.69242793 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering

Author: Tat-jun Chin, Hanzi Wang, David Suter

Abstract: We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model ﬁtting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other stateof-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efﬁciency. 1 Introduction1 Multi-body motion segmentation concerns the separation of motions arising from multiple moving objects in a video sequence. The input data is usually a set of points on the surface of the objects which are tracked throughout the video sequence. Motion segmentation can serve as a useful preprocessing step for many computer vision applications. In recent years the case of rigid (i.e. nonarticulated) objects for which the motions could be semi-dependent on each other has received much attention [18, 14, 19, 21, 22, 17]. Under this domain the afﬁne projection model is usually adopted. Such a model implies that the point trajectories from a particular motion lie on a linear subspace of at most four, and trajectories from different motions lie on distinct subspaces. Thus multi-body motion segmentation is reduced to the problem of subspace segmentation or clustering. To realize practical algorithms, motion segmentation approaches should possess four desirable attributes: (1) Accuracy in classifying the point trajectories to the motions they respectively belong to. This is crucial for success in the subsequent vision applications, e.g. object recognition, 3D reconstruction. (2) Robustness against inlier noise (e.g. slight localization error) and gross outliers (e.g. mistracks, spurious trajectories), since getting imperfect data is almost always unavoidable in practical circumstances. (3) Ability to automatically deduce the number of motions in the data. This is pivotal to accomplish fully automated vision applications. (4) Computational efﬁciency. This is integral for the processing of video sequences which are usually large amounts of data. Recent work on multi-body motion segmentation can roughly be divided into algebraic or factorization methods [3, 19, 20], statistical methods [17, 7, 14, 6, 10] and clustering methods [22, 21, 5]. Notable approaches include Generalized PCA (GPCA) [19, 20], an algebraic method based on the idea that one can ﬁt a union of m subspaces with a set of polynomials of degree m. Statistical methods often employ concepts such random hypothesis generation [4, 17], Expectation-Maximization [14, 6] 1 This work was supported by the Australian Research Council (ARC) under the project DP0878801. 1 and geometric model selection [7, 8]. Clustering based methods [22, 21, 5] are also gaining attention due to their effectiveness. They usually include a dimensionality reduction step (e.g. manifold learning [5]) followed by a clustering of the point trajectories (e.g. via spectral clustering in [21]). A recent benchmark [18] indicated that Local Subspace Afﬁnity (LSA) [21] gave the best performance in terms of classiﬁcation accuracy, although their result was subsequently surpassed by [5, 10]. However, we argue that most of the previous approaches do not simultaneously fulﬁl the qualities desirable of motion segmentation algorithms. Most notably, although some of the approaches have the means to estimate the number of motions, they are generally unreliable in this respect and require manual input of this parameter. In fact this prior knowledge was given to all the methods compared in [18]2 . Secondly, most of the methods (e.g. [19, 5]) do not explicitly deal with outliers. They will almost always breakdown when given corrupted data. These deﬁciencies reduce the usefulness of available motion segmentation algorithms in practical circumstances. In this paper we attempt to bridge the gap between experimental performance and practical usability. Our previous work [2] indicates that robust multi-structure model ﬁtting can be achieved effectively with statistical learning. Here we extend this concept to motion subspace clustering. Drawing inspiration from robust statistical model ﬁtting [4], we estimate random hypotheses of motion subspaces in the data. However, instead of ranking these hypotheses we encapsulate them in a novel Mercer kernel. The kernel can function reliably despite overwhelming sampling imbalance, and it permits the application of non-linear dimensionality reduction techniques to effectively identify and reject outlying trajectories. This is then followed by Kernel PCA [11] to maximize the separation between groups and spectral clustering [13] to recover the number of motions and clustering. Experiments on the Hopkins 155 benchmark dataset [18] show that our method is superior to other approaches in terms of the qualities described above, including computational efﬁciency. 1.1 Brief review of afﬁne model multi-body motion segmentation Let {tf p ∈ R2 }f =1,...,F be the set of 2D coordinates of P trajectories tracked across F frames. In p=1,...,P multi-body motion segmentation the tf p ’s correspond to points on the surface of rigid objects which are moving. The goal is to separate the trajectories into groups corresponding to the motion they belong to. In other words, if we arrange the coordinates in the following data matrix   t11 · · · t1P  . .  ∈ R2F ×P , .. .  T= . (1) . . . tF 1 . . . tF P the goal is to ﬁnd the permutation Γ ∈ RP ×P such that the columns of T · Γ are arranged according to the respective motions they belong to. It turns out that under afﬁne projection [1, 16] trajectories from the same motion lie on a distinct subspace in R2F , and each of these motion subspaces is of dimensions 2, 3 or 4. Thus motion segmentation can be accomplished via clustering subspaces in R2F . See [1, 16] for more details. Realistically actual motion sequences might contain trajectories which do not correspond to valid objects or motions. These trajectories behave as outliers in the data and, if not taken into account, can be seriously detrimental to subspace clustering algorithms. 2 The Ordered Residual Kernel (ORK) First, we take a statistical model ﬁtting point of view to motion segmentation. Let {xi }i=1,...,N be the set of N samples on which we want to perform model ﬁtting. We randomly draw p-subsets from the data and use it to ﬁt a hypothesis of the model, where p is the number of parameters that deﬁne the model. In motion segmentation, the xi ’s are the columns of matrix T, and p = 4 since the model is a four-dimensional subspace3 . Assume that M of such random hypotheses are drawn. i i For each data point xi compute its absolute residual set ri = {r1 , . . . , rM } as measured to the M hypotheses. For motion segmentation, the residual is the orthogonal distance to a hypothesis 2 As conﬁrmed through private contact with the authors of [18]. Ideally we should also consider degenerate motions with subspace dimensions 2 or 3, but previous work [18] using RANSAC [4] and our results suggest this is not a pressing issue for the Hopkins 155 dataset. 3 2 i i subspace. We sort the elements in ri to obtain the sorted residual set ˜i = {rλi , . . . , rλi }, where r 1 M i i the permutation {λi , . . . , λi } is obtained such that rλi ≤ · · · ≤ rλi . Deﬁne the following 1 M 1 M ˜ θi := {λi , . . . , λi } 1 M (2) ˜ as the sorted hypothesis set of point xi , i.e. θi depicts the order in which xi becomes the inlier of the M hypotheses as a ﬁctitious inlier threshold is increased from 0 to ∞. We deﬁne the Ordered Residual Kernel (ORK) between two data points as 1 kr (xi1 , xi2 ) := ˜ Z M/h t ˜ ˜ zt · k∩ (θi1 , θi2 ), (3) t=1 M/h where zt = 1 are the harmonic series and Z = t=1 zt is the (M/h)-th harmonic number. t Without lost of generality assume that M is wholly divisible by h. Step size h is used to obtain the Difference of Intersection Kernel (DOIK) 1 ˜1:α t ˜ ˜ ˜1:α ˜1:α ˜1:α k∩ (θi1 , θi2 ) := (|θi1 t ∩ θi2 t | − |θi1 t−1 ∩ θi2 t−1 |) (4) h ˜a:b where αt = t · h and αt−1 = (t − 1) · h. Symbol θi indicates the set formed by the a-th to ˜i . Since the contents of the sorted hypotheses set are merely permutations of the b-th elements of θ {1 . . . M }, i.e. there are no repeating elements, 0 ≤ kr (xi1 , xi2 ) ≤ 1. ˜ (5) Note that kr is independent of the type of model to be ﬁtted, thus it is applicable to generic statistical ˜ model ﬁtting problems. However, we concentrate on motion subspaces in this paper. Let τ be a ﬁctitious inlier threshold. The kernel kr captures the intuition that, if τ is low, two ˜ points arising from the same subspace will have high normalized intersection since they share many common hypotheses which correspond to that subspace. If τ is high, implausible hypotheses ﬁtted on outliers start to dominate and decrease the normalized intersection. Step size h allows us to quantify the rate of change of intersection if τ is increased from 0 to ∞, and since zt is decreasing, kr will evaluate to a high value for two points from the same subspace. In contrast, kr is always low ˜ ˜ for points not from the same subspace or that are outliers. Proof of satisfying Mercer’s condition. Let D be a ﬁxed domain, and P(D) be the power set of D, i.e. the set of all subsets of D. Let S ⊆ P(D), and p, q ∈ S. If µ is a measure on D, then k∩ (p, q) = µ(p ∩ q), (6) called the intersection kernel, is provably a valid Mercer kernel [12]. The DOIK can be rewritten as t ˜ ˜ k∩ (θi1 , θi2 ) = 1 ˜(αt−1 +1):αt ˜(αt−1 +1):αt (|θ ∩ θi2 | h i1 ˜1:(α ) ˜(α +1):αt | + |θ (αt−1 +1):αt ∩ θ 1:(αt−1 ) |). ˜ ˜ +|θi1 t−1 ∩ θi2 t−1 i1 i2 (7) If we let D = {1 . . . M } be the set of all possible hypothesis indices and µ be uniform on D, each term in Eq. (7) is simply an intersection kernel multiplied by |D|/h. Since multiplying a kernel with a positive constant and adding two kernels respectively produce valid Mercer kernels [12], the DOIK and ORK are also valid Mercer kernels.• Parameter h in kr depends on the number of random hypotheses M , i.e. step size h can be set as a ˜ ratio of M . The value of M can be determined based on the size of the p-subset and the size of the data N (e.g. [23, 15]), and thus h is not contingent on knowledge of the true inlier noise scale or threshold. Moreover, our experiments in Sec. 4 show that segmentation performance is relatively insensitive to the settings of h and M . 2.1 Performance under sampling imbalance Methods based on random sampling (e.g. RANSAC [4]) are usually affected by unbalanced datasets. The probability of simultaneously retrieving p inliers from a particular structure is tiny if points 3 from that structure represent only a small minority in the data. In an unbalanced dataset the “pure” p-subsets in the M randomly drawn samples will be dominated by points from the majority structure in the data. This is a pronounced problem in motion sequences, since there is usually a background “object” whose point trajectories form a large majority in the data. In fact, for motion sequences from the Hopkins 155 dataset [18] with typically about 300 points per sequence, M has to be raised to about 20,000 before a pure p-subset from the non-background objects is sampled. However, ORK can function reliably despite serious sampling imbalance. This is because points from the same subspace are roughly equi-distance to the sampled hypotheses in their vicinity, even though these hypotheses might not pass through that subspace. Moreover, since zt in Eq. (3) is decreasing only residuals/hypotheses in the vicinity of a point are heavily weighted in the intersection. Fig. 1(a) illustrates this condition. Results in Sec. 4 show that ORK excelled even with M = 1, 000. (a) Data in R2F . (b) Data in RKHS Fkr . ˜ Figure 1: (a) ORK under sampling imbalance. (b) Data in RKHS induced by ORK. 3 Multi-Body Motion Segmentation using ORK In this section, we describe how ORK is used for multi-body motion segmentation. 3.1 Outlier rejection via non-linear dimensionality reduction Denote by Fkr the Reproducing Kernel Hilbert Space (RKHS) induced by kr . Let matrix A = ˜ ˜ [φ(x1 ) . . . φ(xN )] contain the input data after it is mapped to Fkr . The kernel matrix K = AT A is ˜ computed using the kernel function kr as ˜ Kp,q = φ(xp ), φ(xq ) = kr (xp , xq ), p, q ∈ {1 . . . N }. ˜ (8) Since kr is a valid Mercer kernel, K is guaranteed to be positive semi-deﬁnite [12]. Let K = ˜ Q∆QT be the eigenvalue decomposition (EVD) of K. Then the rank-n Kernel Singular Value Decomposition (Kernel SVD) [12] of A is 1 1 An = [AQn (∆n )− 2 ][(∆n ) 2 ][(Qn )T ] ≡ Un Σn (Vn )T . n n (9) n Via the Matlab notation, Q = Q:,1:n and ∆ = ∆1:n,1:n . The left singular vectors U is an orthonormal basis for the n-dimensional principal subspace of the whole dataset in Fkr . Projecting ˜ the data onto the principal subspace yields 1 1 B = [AQn (∆n )− 2 ]T A = (∆n ) 2 (Qn )T , (10) n×N where B = [b1 . . . bN ] ∈ R is the reduced dimension version of A. Directions of the principal subspace are dominated by inlier points, since kr evaluates to a high value generally for them, but ˜ always to a low value for gross outliers. Moreover the kernel ensures that points from the same subspace are mapped to the same cluster and vice versa. Fig. 1(b) illustrates this condition. Fig. 2(a)(left) shows the ﬁrst frame of sequence “Cars10” from the Hopkins 155 dataset [18] with 100 false trajectories of Brownian motion added to the original data (297 points). The corresponing RKHS norm histogram for n = 3 is displayed in Fig. 2(b). The existence of two distinct modes, 4 15 Outlier mode Bin count Inlier mode 10 5 0 (a) (left) Before and (right) after outlier removal. Blue dots are inliers while red dots are added outliers. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Vector norm in principal subspace 0.18 0.2 (b) Actual norm histogram of “cars10”. Figure 2: Demonstration of outlier rejection on sequence “cars10” from Hopkins 155. corresponding respectively to inliers and outliers, is evident. We exploit this observation for outlier rejection by discarding data with low norms in the principal subspace. The cut-off threshold ψ can be determined by analyzing the shape of the distribution. For instance we can ﬁt a 1D Gaussian Mixture Model (GMM) with two components and set ψ as the point of equal Mahalanobis distance between the two components. However, our experimentation shows that an effective threshold can be obtained by simply setting ψ as the average value of all the norms, i.e. ψ= 1 N N bi . (11) i=1 This method was applied uniformly on all the sequences in our experiments in Sec. 4. Fig. 2(a)(right) shows an actual result of the method on Fig. 2(a)(left). 3.2 Recovering the number of motions and subspace clustering After outlier rejection, we further take advantage of the mapping induced by ORK for recovering the number of motions and subspace clustering. On the remaining data, we perform Kernel PCA [11] to seek the principal components which maximize the variance of the data in the RKHS, as Fig. 1(b) illustrates. Let {yi }i=1,...,N ′ be the N ′ -point subset of the input data that remains after outlier removal, where N ′ < N . Denote by C = [φ(y1 ) . . . φ(yN ′ )] the data matrix after mapping the data ˜ to Fkr , and by symbol C the result of adjusting C with the empirical mean of {φ(y1 ), . . . , φ(yN ′ )}. ˜ ˜ ˜ ˜ The centered kernel matrix K′ = CT C [11] can be obtained as 1 ˜ K′ = ν T K′ ν, ν = [IN ′ − ′ 1N ′ ,N ′ ], (12) N where K′ = CT C is the uncentered kernel matrix, Is and 1s,s are respectively the s × s identity ˜ ˜ matrix and a matrix of ones. If K′ = RΩRT is the EVD of K′ , then we obtain ﬁrst-m kernel m ˜ principal components P of C as the ﬁrst-m left singular vectors of C , i.e. 1 ˜ Pm = CRm (Ωm )− 2 , (13) where Rm = R:,1:m and Ω1:m,1:m ; see Eq. (9). Projecting the data on the principal components yields 1 D = [d1 . . . dN ′ ] = (Ωm ) 2 (Rm )T , (14) ′ where D ∈ Rm×N . The afﬁne subspace span(Pm ) maximizes the spread of the centered data in the RKHS, and the projection D offers an effective representation for clustering. Fig. 3(a) shows the Kernel PCA projection results for m = 3 on the sequence in Fig. 2(a). The number of clusters in D is recovered via spectral clustering. More speciﬁcally we apply the Normalized Cut (Ncut) [13] algorithm. A fully connected graph is ﬁrst derived from the data, where ′ ′ its weighted adjacency matrix W ∈ RN ×N is obtained as Wp,q = exp(− dp − dq 2 /2δ 2 ), (15) and δ is taken as the average nearest neighbour distance in the Euclidean sense among the vectors in D. The Laplacian matrix [13] is then derived from W and eigendecomposed. Under Ncut, 5 0.1 0.05 0 −0.05 −0.1 0.1 −0.15 0.15 0.08 0.1 0.05 0 −0.05 −0.1 0.06 (a) Kernel PCA and Ncut results. (b) W matrix. (c) Final result for “cars10”. Figure 3: Actual results on the motion sequence in Fig. 2(a)(left). the number of clusters is revealed as the number of eigenvalues of the Laplacian that are zero or numerically insigniﬁcant. With this knowledge, a subsequent k-means step is then performed to cluster the points. Fig. 3(b) shows W for the input data in Fig. 2(a)(left) after outlier removal. It can be seen that strong afﬁnity exists between points from the same cluster, thus allowing accurate clustering. Figs. 3(a) and 3(c) illustrate the ﬁnal clustering result for the data in Fig. 2(a)(left). There are several reasons why spectral clustering under our framework is more successful than previous methods. Firstly, we perform an effective outlier rejection step that removes bad trajectories that can potentially mislead the clustering. Secondly, the mapping induced by ORK deliberately separates the trajectories based on their cluster membership. Finally, we perform Kernel PCA to maximize the variance of the data. Effectively this also improves the separation of clusters, thus facilitating an accurate recovery of the number of clusters and also the subsequent segmentation. This distinguishes our work from previous clustering based methods [21, 5] which tend to operate without maximizing the between-class scatter. Results in Sec. 4 validate our claims. 4 Results Henceforth we indicate the proposed method as “ORK”. We leverage on a recently published benchmark on afﬁne model motion segmentation [18] as a basis of comparison. The benchmark was evaluated on the Hopkins 155 dataset4 which contains 155 sequences with tracked point trajectories. A total of 120 sequences have two motions while 35 have three motions. The sequences contain degenerate and non-degenerate motions, independent and partially dependent motions, articulated motions, nonrigid motions etc. In terms of video content three categories exist: Checkerboard sequences, trafﬁc sequences (moving cars, trucks) and articulated motions (moving faces, people). 4.1 Details on benchmarking Four major algorithms were compared in [18]: Generalized PCA (GPCA) [19], Local Subspace Afﬁnity (LSA) [21], Multi-Stage Learning (MSL) [14] and RANSAC [17]. Here we extend the benchmark with newly reported results from Locally Linear Manifold Clustering (LLMC) [5] and Agglomerative Lossy Compression (ALC) [10, 9]. We also compare our method against Kanatani and Matsunaga’s [8] algorithm (henceforth, the “KM” method) in estimating the number of independent motions in the video sequences. Note that KM per se does not perform motion segmentation. For the sake of objective comparisons we use only implementations available publicly5. Following [18], motion segmentation performance is evaluated in terms of the labelling error of the point trajectories, where each point in a sequence has a ground truth label, i.e. number of mislabeled points . (16) classiﬁcation error = total number of points Unlike [18], we also emphasize on the ability of the methods in recovering the number of motions. However, although the methods compared in [18] (except RANSAC) theoretically have the means to 4 Available at http://www.vision.jhu.edu/data/hopkins155/. For MSL and KM, see http://www.suri.cs.okayama-u.ac.jp/e-program-separate.html/. For GPCA, LSA and RANSAC, refer to the url for the Hopkins 155 dataset. 5 6 do so, their estimation of the number of motions is generally unrealiable and the benchmark results in [18] were obtained by revealing the actual number of motions to the algorithms. A similar initialization exists in [5, 10] where the results were obtained by giving LLMC and ALC this knowledge a priori (for LLMC, this was given at least to the variant LLMC 4m during dimensionality reduction [5], where m is the true number of motions). In the following subsections, where variants exist for the compared algorithms we use results from the best performing variant. In the following the number of random hypotheses M and step size h for ORK are ﬁxed at 1000 and 300 respectively, and unlike the others, ORK is not given knowledge of the number of motions. 4.2 Data without gross outliers We apply ORK on the Hopkins 155 dataset. Since ORK uses random sampling we repeat it 100 times for each sequence and average the results. Table 1 depicts the obtained classiﬁcation error among those from previously proposed methods. ORK (column 9) gives comparable results to the other methods for sequences with 2 motions (mean = 7.83%, median = 0.41%). For sequences with 3 motions, ORK (mean = 12.62%, median = 4.75%) outperforms GPCA and RANSAC, but is slightly less accurate than the others. However, bear in mind that unlike the other methods ORK is not given prior knowledge of the true number of motions and has to estimate this independently. Column Method 1 REF 2 GPCA Mean Median 2.03 0.00 4.59 0.38 Mean Median 5.08 2.40 28.66 28.26 3 4 5 6 LSA MSL RANSAC LLMC Sequences with 2 motions 3.45 4.14 5.56 3.62 0.59 0.00 1.18 0.00 Sequences with 3 motions 9.73 8.23 22.94 8.85 2.33 1.76 22.03 3.19 8 ALC 9 ORK 10 ORK∗ 3.03 0.00 7.83 0.41 1.27 0.00 6.26 1.02 12.62 4.75 2.09 0.05 Table 1: Classiﬁcation error (%) on Hopkins 155 sequences. REF represents the reference/control method which operates based on knowledge of ground truth segmentation. Refer to [18] for details. We also separately investigate the accuracy of ORK in estimating the number of motions, and compare it against KM [8] which was proposed for this purpose. Note that such an experiment was not attempted in [18] since approaches compared therein generally do not perform reliably in estimating the number of motions. The results in Table 2 (columns 1–2) show that for sequences with two motions, KM (80.83%) outperforms ORK (67.37%) by ≈ 15 percentage points. However, for sequences with three motions, ORK (49.66%) vastly outperforms KM (14.29%) by more than doubling the percentage points of accuracy. The overall accuracy of KM (65.81%) is slightly better than ORK (63.37%), but this is mostly because sequences with two motions form the majority in the dataset (120 out of 155). This leads us to conclude that ORK is actually the superior method here. Dataset Column Method 2 motions 3 motions Overall Hopkins 155 1 2 KM ORK 80.83% 67.37% 14.29% 49.66% 65.81% 63.37% Hopkins 155 + Outliers 3 4 KM ORK 00.00% 47.58% 100.00% 50.00% 22.58% 48.13% Table 2: Accuracy in determining the number of motions in a sequence. Note that in the experiment with outliers (columns 3–4), KM returns a constant number of 3 motions for all sequences. We re-evaluate the performance of ORK by considering only results on sequences where the number of motions is estimated correctly by ORK (there are about 98 ≡ 63.37% of such cases). The results are tabulated under ORK∗ (column 10) in Table 1. It can be seen that when ORK estimates the number of motions correctly, it is signiﬁcantly more accurate than the other methods. Finally, we compare the speed of the methods in Table 3. ORK was implemented and run in Matlab on a Dual Core Pentium 3.00GHz machine with 4GB of main memory (this is much less powerful 7 than the 8 Core Xeon 3.66GHz with 32GB memory used in [18] for the other methods in Table 3). The results show that ORK is comparable to LSA, much faster than MSL and ALC, but slower than GPCA and RANSAC. Timing results of LLMC are not available in the literature. Method 2 motions 3 motions GPCA 324ms 738ms LSA 7.584s 15.956s MSL 11h 4m 1d 23h RANSAC 175ms 258ms ALC 10m 32s 10m 32s ORK 4.249s 8.479s Table 3: Average computation time on Hopkins 155 sequences. 4.3 Data with gross outliers We next examine the ability of the proposed method in dealing with gross outliers in motion data. For each sequence in Hopkins 155, we add 100 gross outliers by creating trajectories corresponding to mistracks or spuriously occuring points. These are created by randomly initializing 100 locations in the ﬁrst frame and allowing them to drift throughout the sequence according to Brownian motion. The corrupted sequences are then subjected to the algorithms for motion segmentation. Since only ORK is capable of rejecting outliers, the classiﬁcation error of Eq. (16) is evaluated on the inlier points only. The results in Table 4 illustrate that ORK (column 4) is the most accurate method by a large margin. Despite being given the true number of motions a priori, GPCA, LSA and RANSAC are unable to provide satisfactory segmentation results. Column Method Mean Median Mean Median 1 2 3 4 GPCA LSA RANSAC ORK Sequences with 2 motions 28.66 24.25 30.64 16.50 30.96 26.51 32.36 10.54 Sequences with 3 motions 40.61 30.94 42.24 19.99 41.30 27.68 43.43 8.49 5 ORK∗ 1.62 0.00 2.68 0.09 Table 4: Classiﬁcation error (%) on Hopkins 155 sequences with 100 gross outliers per sequence. In terms of estimating the number of motions, as shown in column 4 in Table 2 the overall accuracy of ORK is reduced to 48.13%. This is contributed mainly by the deterioration in accuracy on sequences with two motions (47.58%), although the accuracy on sequences with three motions are maintained (50.00%). This is not a surprising result, since sequences with three motions generally have more (inlying) point trajectories than sequences with two motions, thus the outlier rates for sequences with three motions are lower (recall that a ﬁxed number of 100 false trajectories are added). On the other hand, the KM method (column 3) is completely overwhelmed by the outliers— for all the sequences with outliers it returned a constant “3” as the number of motions. We again re-evaluate ORK by considering results from sequences (now with gross outliers) where the number of motions is correctly estimated (there are about 75 ≡ 48.13% of such cases). The results tabulated under ORK∗ (column 5) in Table 4 show that the proposed method can accurately segment the point trajectories without being inﬂuenced by the gross outliers. 5 Conclusions In this paper we propose a novel and highly effective approach for multi-body motion segmentation. Our idea is based on encapsulating random hypotheses in a novel Mercel kernel and statistical learning. We evaluated our method on the Hopkins 155 dataset with results showing that the idea is superior other state-of-the-art approaches. It is by far the most accurate in terms of estimating the number of motions, and it excels in segmentation accuracy despite lacking prior knowledge of the number of motions. The proposed idea is also highly robust towards outliers in the input data. Acknowledgements. We are grateful to the authors of [18] especially Ren´ Vidal for discussions e and insights which have been immensely helpful. 8 References [1] T. Boult and L. Brown. Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding, 1991. [2] T.-J. Chin, H. Wang, and D. Suter. Robust ﬁtting of multiple structures: The statistical learning approach. In ICCV, 2009. [3] J. Costeira and T. Kanade. A multibody factorization method for independently moving objects. IJCV, 29(3):159–179, 1998. [4] M. A. Fischler and R. C. Bolles. Random sample concensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography. Comm. of the ACM, 24:381–395, 1981. [5] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In CVPR, 2007. [6] A. Gruber and Y. Weiss. Multibody factorization with uncertainty and missing data using the EM algorithm. In CVPR, 2004. [7] K. Kanatani. Motion segmentation by subspace separation and model selection. In ICCV, 2001. [8] K. Kanatani and C. Matsunaga. Estimating the number of independent motions for multibody segmentation. In ACCV, 2002. [9] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. TPAMI, 29(9):1546–1562, 2007. [10] S. Rao, R. Tron, Y. Ma, and R. Vidal. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In CVPR, 2008. [11] B. Sch¨ lkopf, A. Smola, and K. R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10:1299–1319, 1998. [12] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004. [13] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000. [14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Workshop on Statistical Methods in Video Processing, 2004. [15] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-Linkage. In ECCV, 2008. [16] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography. IJCV, 9(2):137–154, 1992. [17] P. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of London, 356(1740):1321–1340, 1998. [18] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In CVPR, 2007. [19] R. Vidal and R. Hartley. Motion segmentation with missing data by PowerFactorization and Generalized PCA. In CVPR, 2004. [20] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). TPAMI, 27(12):1–15, 2005. [21] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. [22] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications on multibody and multi-sequence factorization. In CVPR, 2003. [23] W. Zhang and J. Koseck´ . Nonparametric estimation of multiple structures with outliers. In a Dynamical Vision, ICCV 2005 and ECCV 2006 Workshops, 2006. 9

4 0.52986217 237 nips-2009-Subject independent EEG-based BCI decoding

Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller

5 0.51682085 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

Author: Kai Yu, Tong Zhang, Yihong Gong

Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difﬁcult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1

6 0.51508057 104 nips-2009-Group Sparse Coding

7 0.51503313 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

8 0.51375479 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding

9 0.51257956 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

10 0.51140392 72 nips-2009-Distribution Matching for Transduction

11 0.51103514 129 nips-2009-Learning a Small Mixture of Trees

12 0.5103671 260 nips-2009-Zero-shot Learning with Semantic Output Codes

13 0.50969076 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

14 0.50936747 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

15 0.50900328 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

16 0.50775671 70 nips-2009-Discriminative Network Models of Schizophrenia

17 0.50738823 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

18 0.50651193 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

19 0.50635707 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

20 0.50577253 27 nips-2009-Adaptive Regularization of Weight Vectors