nips nips2009 nips2009-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang
Abstract: In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. In permutation tests, except exchangeability, no other statistical assumptions are required. The p-values can be obtained by using the permutation distribution. Permutation tests are appealing in many biomedical studies, which often have limited observations with unknown distribution. They have been used successfully in structural MR image analysis [1, 2, 3], in functional MR image analysis [4], and in 3D face analysis [5]. There are three common approaches to construct the permutation distribution [6, 7, 8]: (1) exact permutation enumerating all possible arrangements; (2) approximate permutation based on random sampling from all possible permutations; (3) approximate permutation using the analytical moments of the exact permutation distribution under the null hypothesis. The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. The second technique often gives inflated type I errors caused by random sampling. When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. In most applications, it is not the existence but the derivation of moments that limits the third approach. To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. Then we can obtain the moments by summing up several subtotals. The proposed method can be extended to equivalent weighted v-statistics by replacing them with monotonic weighted v-statistics. This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. The p-values are then obtained without conducting any real permutation. For multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, the number of tests m = 2,000, we need to conduct m×(n1 + n2 )!/n1 !/n2 ! 1.1×1015 permutations for the exact permutation test. Even for 20,000 random permutations per test, we still need m×20,000 4×107 permutations. Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. The detailed discussion about equivalent test statistics can be found in [7, 8, 10]. 2.1 C o m p ut a t i o n a l c h a l l e n g e Let us first look at a toy example. Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. For comparison of the two groups, the hypothesis is typically constructed as: H 0 : m A = mB vs. H a : m A ¹ m B , where m A , m B are the population means of the groups A n1 n i =1 i = n1 +1 and B, respectively. Define x A = å xi / n1 and xB = å xi / n2 as the sample means of two groups, where n=n1+n2. We choose the univariate group mean difference statistic, i.e., n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . Then the total number of all possible permutations of {1, L, n} is n!. To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! p ÎS n i =1 n ! p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. Ep (T 4 ( x ))= 2.2 P a r t i t i o n t h e i n de x s e t In this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of n n i1 =1 id =1 degree d [9], that is, T ( x) = å L å w(i1 , L , id ) h( xi1 ,L , xid ) , where x = ( x1 , x 2 , L , x n ) T is a data with n observations, and w is a symmetric index function. h is a symmetric data function, i.e., invariant under permutation of (i1 ,L , id ) . Though the symmetry property is not required for our method, it helps reduce the computational cost. Here, each observation xk can be either univariate or multivariate. In the above toy example, d=1 and h is the identity function. Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Since all possible permutations map any index value between 1 and n to all possible index r values from 1 to n with equal probability, Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) , the summation of k =1 1 d data function over all permutations is only related to the equal/unequal relationship among indices. It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. Definition 1. Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) = (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). Definition 2. Two indices {(i1(1) , L , id (1) ), L , (i1( r ) , L , id ( r ) )} and {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} are said to be permutation equivalent/ if there exists a permutation p Î Sn such that {(p (i1(1) ), L , p (id (1) )), L , (p (i1( r ) ), L , p (id ( r ) ))} = {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} . Here
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. [sent-9, score-1.648]
2 This approach involves the calculation of the first four moments of the permutation distribution. [sent-10, score-0.918]
3 We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. [sent-11, score-0.166]
4 The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. [sent-13, score-0.803]
5 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. [sent-14, score-0.39]
6 In permutation tests, except exchangeability, no other statistical assumptions are required. [sent-15, score-0.657]
7 The p-values can be obtained by using the permutation distribution. [sent-16, score-0.657]
8 The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. [sent-20, score-0.846]
9 When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. [sent-22, score-0.744]
10 Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. [sent-23, score-1.024]
11 In most applications, it is not the existence but the derivation of moments that limits the third approach. [sent-24, score-0.166]
12 To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. [sent-25, score-0.85]
13 Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. [sent-26, score-0.87]
14 Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. [sent-27, score-1.095]
15 However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. [sent-28, score-0.701]
16 In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. [sent-29, score-0.887]
17 We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. [sent-30, score-1.281]
18 Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. [sent-31, score-1.449]
19 Then we can obtain the moments by summing up several subtotals. [sent-32, score-0.19]
20 This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. [sent-34, score-0.289]
21 Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. [sent-35, score-0.676]
22 Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). [sent-43, score-1.628]
23 Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. [sent-44, score-0.781]
24 For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . [sent-45, score-0.417]
25 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. [sent-46, score-0.913]
26 For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. [sent-47, score-0.156]
27 Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. [sent-51, score-0.147]
28 , n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . [sent-57, score-0.232]
29 To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! [sent-60, score-1.724]
30 p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. [sent-62, score-0.759]
31 The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. [sent-63, score-0.509]
32 It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. [sent-64, score-0.154]
33 Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . [sent-73, score-2.292]
34 L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. [sent-74, score-1.212]
35 d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. [sent-75, score-1.15]
36 It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. [sent-77, score-1.708]
37 Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. [sent-79, score-0.882]
38 Here "=" means they have same index elements by Definition 1. [sent-83, score-0.188]
39 As a result, the whole index set for d = 2, r = 2, can be divided into seven permutation equivalent subsets, [{(1, 1), (1, 1)}], [{(1, 1), (1, 2)}], [{(1, 1), (2, 2)}], [{(1, 2), (1, 2)}], [{(1, 1), (2, 3)}], [{(1, 2), (1, 3)}], [{(1, 2), (3, 4)}], where [ ] denotes the equivalence class. [sent-85, score-0.939]
40 Note that the number of the permutation equivalent subsets is only related to the order of weighted v-test statistic d and the order of moment r , but not related to the data size n, and it is small for the first several moments calculation (small r) with low order test statistics (small d). [sent-86, score-1.292]
41 Using the permutation equivalent relationship defined in Definition 2, the whole index set U can be partitioned into several permutation equivalent index subsets. [sent-87, score-1.842]
42 Then we can calculate the r-th moment by summing up subtotals of all index subsets. [sent-88, score-0.295]
43 This procedure can be done without any real permutations based on Proposition 1 and Proposition 2 below. [sent-89, score-0.145]
44 p ÎSn k =1 k =1 r å = ( Õ h ( x j ( k ) ,L , x j 1 {( j1(1) ,L, jd (1) ),L,( j1( r ) ,L, jd ( r ) )}Î[{( i1(1) ,Lid (1) ),L,(i1( r ) ,Lid ( r ) )}] k =1 (1) (1) (r ) (r) card ([{(i1 ,L , id ),L , (i1 ,L , id )}]) d ) n ! [sent-93, score-1.201]
45 r å = ( Õ h ( x j ( k ) ,L , x j {( j1(1) ,L, jd (1) ),L,( j1( r ) ,L, jd ( r ) )}Î[{(i1(1) ,L,id (1) ),L,( i1( r ) ,L,id ( r ) )}] k =1 card ([{(i1(1) ,L , id (1) ),L , (i1( r ) ,L , id (r ) )}]) d 1 (k ) )) . [sent-95, score-1.201]
46 Thus we can obtain the r-th moment by summing up the production of the data partition sum wl and the index partition sum hl over all permutation equivalent subsets, i. [sent-97, score-1.634]
47 , Ep (T r ( x)) = å wl hl , where l = [{(i1(1) ,L , id (1) ),L , (i1( r ) ,L , id ( r ) )}] is any permutation lÎ[U ] equivalent subset of the whole index set U. [sent-99, score-1.777]
48 [U] denotes the set of all distinct permutation equivalent classes of U. [sent-100, score-0.734]
49 The data partition sum is r å hl = wl = ( Õ h( x j ( k ) , L , x j {( j1(1) ,L, jd (1) ),L,( j1( r ) ,L, jd ( r ) )}Îl k =1 (k ) d 1 )) , and the index partition sum is card (l ) r å ( Õ w( x j ( k ) , L , x j {( j1(1) ,L, jd (1) ),L,( j1( r ) ,L, jd ( r ) )}Îl k =1 d 1 (k ) )) . [sent-101, score-2.115]
50 k =1 lÎ[U ] Since both data partition sum wl and the index partition sum hl can be calculated by summation over all distinct indices within each permutation equivalent index subset, no any real permutation is needed for computing the moments. [sent-103, score-2.542]
51 3 R e c ur s i v e c a l c ul a t i o n Direct calculation of the data partition sum and index partition sum leads to traversing throughout the whole index set. [sent-105, score-0.858]
52 Let l = [{(i1(1) ,L , id (1) ), L , (i1(r ) ,L , id ( r ) )}] and n = [( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )] . [sent-109, score-1.152]
53 l and n are two different permutation equivalent subsets of the whole index set U. [sent-110, score-0.974]
54 , n p l , if l can be converted to n by merging two or more index elements. [sent-113, score-0.263]
55 [{(1, 1), (3, 4)}] and [{(1, 1), (2, 3)}] are the same permutation equivalent index subsets because we can apply the permutation ¹: 1 1, 2 4, 3 3, 4 2 to [{(1, 1), (3, 4)}]. [sent-115, score-1.595]
56 The partition order of a permutation equivalent subset n is said to be lower than that of another permutation equivalent subset l if there is a directed path from l to n . [sent-118, score-1.639]
57 [ {(1, 1 ) , (1, 1 )}] [ {(1, 1 ) , (1, 2 )}] [{(1, 2 ) , [{(1, 1 ) , ( 2, 2 )}] (1, 2 )}] [{(1, 1 ) , ( 2 , 3 )}] [{(1, 2 ) , (1, 3 )}] [ {(1, 2 ) , ( 3 , 4 )}] Figure 1: Order of all permutation equivalent subsets when d = 2 and r = 2. [sent-119, score-0.75]
58 The difficulty for computing data partition sum and index partition sum comes from two constraints; equal constraint and unequal constraint. [sent-120, score-0.618]
59 For example, in the permutation equivalent subset [{(1, 1), (2, 2)}], the equal constraint is that the first and the second index number are equal and the third and fourth index are also equal. [sent-121, score-1.153]
60 On the other hand, the unequal constraint requires that the first two index numbers are different from those of the last two. [sent-122, score-0.267]
61 Thus, the calculation of a partition sum can be separated into two parts: the relaxed partition sum without unequal constraint, and lower order partition sums. [sent-124, score-0.684]
62 The index partition sum wl can be calculated by subtracting all lower order partition sums from the corresponding relaxed index partition sum wl * , i. [sent-127, score-1.337]
63 , #(l ) #(l ® n ) , where #(l ) is the number of distinct order-sensitive #(n ) n pl permutation equivalent subsets. [sent-129, score-0.734]
64 =2 order-sensitive index partition types for l = [(1, 1), (2, 3)] . [sent-135, score-0.323]
65 #(l ® n ) is the number of different ways of merging a higher order permutation equivalent subset l to a low order permutation equivalent subset n . [sent-138, score-1.53]
66 wl = wl* - å wn The calculation of the data index partition sum is similar. [sent-139, score-0.637]
67 Therefore, the computational cost mainly depends on the calculation of relaxed partition sum and the lowest order partition sum. [sent-140, score-0.536]
68 Since the computational cost of the lowest order term is O(n), we mainly discuss the calculation of relaxed partition sums in the following paragraphs. [sent-141, score-0.351]
69 * wl =[(1,1),(1,2),(1,2),(1,3),(2,3),(1,4)] = #(l ) å w(i , i ) w(i, j ) w(i, j ) w(i, k )w( j , k ) w(i, l ) . [sent-144, score-0.188]
70 The permutation i, j ,k ,l equivalent index subset is represented by an undirected graph. [sent-145, score-0.929]
71 We connect two different nodes if these two corresponding index numbers are in the same index element, i. [sent-147, score-0.376]
72 * Now we shall discuss the steps to compute the wl =[(1,1),(1,2),(1,2),(1,3),(2,3),(1,4)] . [sent-154, score-0.211]
73 Finally, we clear the whole graph and obtain the relaxed index partition sum. [sent-164, score-0.442]
74 For example, the most expensive relaxed index partition sum for d=2 and r=3 is w(i , j ) w(i, k )w( j , k ) , which is a triangle in the graph representation. [sent-167, score-0.456]
75 For a d-th order test statistic, the computational cost of the partition sum for the r-th moment is bounded by O(nm). [sent-169, score-0.331]
76 When d = 1 the computational complexity of the partition sum is O(n). [sent-170, score-0.185]
77 Specifically, the computational cost of the 3rd and 4th moments for a second order test statistic is O(n3). [sent-171, score-0.382]
78 The computational cost for the 1st and 2nd moments is O(n2). [sent-172, score-0.207]
79 Given the first four moments, the Pearson distribution series can be utilized to approximate the permutation distribution of the test statistic without conducting real permutation. [sent-177, score-0.904]
80 3 E xp e ri m e n t a l re su lt s To evaluate the accuracy and efficiency of our moments-based permutation tests, we generate simulated data and conduct permutation tests for both linear and quadratic test statistics. [sent-178, score-1.777]
81 Table 1 illustrates the high accuracy of our moments-based permutation technique. [sent-189, score-0.675]
82 Furthermore, comparing with exact permutation or random 10,000 permutations, the moments-based permutation tests reduce more than 99. [sent-190, score-1.436]
83 Table 1 shows the computation time and p-values of three permutation methods from one simulation. [sent-192, score-0.657]
84 In order to demonstrate the robustness of our method, we repeated the simulation for 10 times in each case, and calculated the mean and variance of the absolute biases of p-values of both moments-based permutation and random permutation, treating the pvalues of exact permutation as gold standard. [sent-193, score-1.464]
85 In most cases, our moments-based permutation is less biased and more stable than random permutation (Table 2), which demonstrates the robustness and accuracy of our method. [sent-194, score-1.359]
86 Table 1: Comparison of computation costs and p-values of three permutation methods: Momentsbased permutation (MP), random permutation (RP), and exact permutation (EP). [sent-195, score-2.663]
87 The t_MP, t_RP, and t_EP denote the computation time (in seconds), and p_MP, p_RP, and p_EP are the p-values of the three permutation methods. [sent-196, score-0.657]
88 Since the exact permutation is too expensive here, we consider the p-values of 200,000 random permutations (EP) as gold standard. [sent-246, score-0.848]
89 Our methods are more than one hundred times faster than 2,000 random permutation (RP) and also more accurate and robust (Table 3). [sent-247, score-0.657]
90 Evaluation of the hypothesis test using our moments-based permutation with the modified Hotelling s T2 test statistics [8] is shown in Fig. [sent-251, score-0.847]
91 Table 2: Robustness and accuracy comparison of moments-based permutation and random permutation across 10 simulations, considering the p-values of exact permutation as gold standard. [sent-256, score-2.052]
92 Mean_ABias_MP and VAR_MP are the mean of the absolute biases and the variance of the biases of p-values of moments-based permutation; Mean_ABias_RP and VAR_RP are the mean of the absolute biases and the variance of the biases of p-values of random permutation. [sent-257, score-0.16]
93 88e-5 Table 3: Computation cost, robustness, and accuracy comparison of moments-based permutation and random permutation across 10 simulations. [sent-283, score-1.332]
94 05 (without correction), through real permutation ((a); number of permutations = 10,000) and using the present moments-based permutation (b). [sent-311, score-1.459]
95 We choose 10 Asian males and 10 white males out of the USF face database to calculate their differences with the modified Hotelling s T2 test statistics. [sent-317, score-0.276]
96 4 C o n c lu si o n We present and develop novel moments-based permutation tests where the permutation distributions are accurately approximated through Pearson distributions for considerably reduced computation cost. [sent-323, score-1.401]
97 General and analytical formulations for the moments of permutation distribution are derived for weighted v-test statistics. [sent-325, score-0.849]
98 The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy/flexibility and efficiency. [sent-326, score-0.785]
99 Holmes (2001), Nonparametric permutation tests for functional neuroimaging: A primer with examples, Human Brain Mapping, 15, 1-25. [sent-331, score-0.744]
100 Wang (2008), Hybrid permutation test with application to surface shape analysis, Statistica Sinica, 18, 1553-1568. [sent-344, score-0.766]
wordName wordTfidf (topN-words)
[('permutation', 0.657), ('jd', 0.308), ('id', 0.268), ('xp', 0.246), ('wl', 0.188), ('index', 0.188), ('moments', 0.166), ('ep', 0.162), ('partition', 0.135), ('statistic', 0.131), ('permutations', 0.128), ('pearson', 0.109), ('hl', 0.088), ('tests', 0.087), ('modified', 0.083), ('definition', 0.08), ('calculation', 0.076), ('moment', 0.061), ('relaxed', 0.06), ('equivalent', 0.058), ('summation', 0.056), ('asian', 0.05), ('hotelling', 0.05), ('sum', 0.05), ('card', 0.049), ('merging', 0.048), ('group', 0.045), ('coefficients', 0.044), ('efficiency', 0.044), ('test', 0.044), ('unequal', 0.043), ('surface', 0.042), ('beta', 0.042), ('gamma', 0.042), ('cost', 0.041), ('proposition', 0.04), ('males', 0.04), ('biases', 0.04), ('univariate', 0.038), ('zhou', 0.038), ('symmetric', 0.038), ('wang', 0.036), ('whole', 0.036), ('indices', 0.036), ('conducting', 0.036), ('subgraph', 0.036), ('subsets', 0.035), ('exact', 0.035), ('morphometry', 0.033), ('styner', 0.033), ('sn', 0.031), ('normal', 0.031), ('node', 0.03), ('biomedical', 0.029), ('face', 0.029), ('groups', 0.028), ('gold', 0.028), ('monotonic', 0.028), ('converted', 0.027), ('efficient', 0.027), ('nichols', 0.027), ('champaign', 0.027), ('robustness', 0.027), ('weighted', 0.026), ('factorial', 0.026), ('subset', 0.026), ('isolated', 0.025), ('summing', 0.024), ('simulated', 0.024), ('shape', 0.023), ('shall', 0.023), ('significant', 0.023), ('unbalanced', 0.023), ('illinois', 0.023), ('graph', 0.023), ('parametric', 0.023), ('said', 0.022), ('calculate', 0.022), ('invariant', 0.021), ('xb', 0.021), ('randomization', 0.021), ('calculated', 0.02), ('lowest', 0.02), ('observations', 0.02), ('statistics', 0.019), ('distinct', 0.019), ('wiley', 0.019), ('mr', 0.019), ('first', 0.019), ('mainly', 0.019), ('correction', 0.018), ('white', 0.018), ('nonparametric', 0.018), ('il', 0.018), ('accuracy', 0.018), ('difference', 0.017), ('location', 0.017), ('constraint', 0.017), ('real', 0.017), ('cases', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 78 nips-2009-Efficient Moments-based Permutation Tests
Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang
Abstract: In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. In permutation tests, except exchangeability, no other statistical assumptions are required. The p-values can be obtained by using the permutation distribution. Permutation tests are appealing in many biomedical studies, which often have limited observations with unknown distribution. They have been used successfully in structural MR image analysis [1, 2, 3], in functional MR image analysis [4], and in 3D face analysis [5]. There are three common approaches to construct the permutation distribution [6, 7, 8]: (1) exact permutation enumerating all possible arrangements; (2) approximate permutation based on random sampling from all possible permutations; (3) approximate permutation using the analytical moments of the exact permutation distribution under the null hypothesis. The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. The second technique often gives inflated type I errors caused by random sampling. When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. In most applications, it is not the existence but the derivation of moments that limits the third approach. To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. Then we can obtain the moments by summing up several subtotals. The proposed method can be extended to equivalent weighted v-statistics by replacing them with monotonic weighted v-statistics. This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. The p-values are then obtained without conducting any real permutation. For multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, the number of tests m = 2,000, we need to conduct m×(n1 + n2 )!/n1 !/n2 ! 1.1×1015 permutations for the exact permutation test. Even for 20,000 random permutations per test, we still need m×20,000 4×107 permutations. Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. The detailed discussion about equivalent test statistics can be found in [7, 8, 10]. 2.1 C o m p ut a t i o n a l c h a l l e n g e Let us first look at a toy example. Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. For comparison of the two groups, the hypothesis is typically constructed as: H 0 : m A = mB vs. H a : m A ¹ m B , where m A , m B are the population means of the groups A n1 n i =1 i = n1 +1 and B, respectively. Define x A = å xi / n1 and xB = å xi / n2 as the sample means of two groups, where n=n1+n2. We choose the univariate group mean difference statistic, i.e., n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . Then the total number of all possible permutations of {1, L, n} is n!. To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! p ÎS n i =1 n ! p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. Ep (T 4 ( x ))= 2.2 P a r t i t i o n t h e i n de x s e t In this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of n n i1 =1 id =1 degree d [9], that is, T ( x) = å L å w(i1 , L , id ) h( xi1 ,L , xid ) , where x = ( x1 , x 2 , L , x n ) T is a data with n observations, and w is a symmetric index function. h is a symmetric data function, i.e., invariant under permutation of (i1 ,L , id ) . Though the symmetry property is not required for our method, it helps reduce the computational cost. Here, each observation xk can be either univariate or multivariate. In the above toy example, d=1 and h is the identity function. Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Since all possible permutations map any index value between 1 and n to all possible index r values from 1 to n with equal probability, Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) , the summation of k =1 1 d data function over all permutations is only related to the equal/unequal relationship among indices. It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. Definition 1. Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) = (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). Definition 2. Two indices {(i1(1) , L , id (1) ), L , (i1( r ) , L , id ( r ) )} and {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} are said to be permutation equivalent/ if there exists a permutation p Î Sn such that {(p (i1(1) ), L , p (id (1) )), L , (p (i1( r ) ), L , p (id ( r ) ))} = {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} . Here
2 0.11643631 7 nips-2009-A Data-Driven Approach to Modeling Choice
Author: Vivek Farias, Srikanth Jagabathula, Devavrat Shah
Abstract: We visit the following fundamental problem: For a ‘generic’ model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal preference information), how may one predict revenues from offering a particular assortment of choices? This problem is central to areas within operations research, marketing and econometrics. We present a framework to answer such questions and design a number of tractable algorithms (from a data and computational standpoint) for the same. 1
3 0.098953158 94 nips-2009-Fast Learning from Non-i.i.d. Observations
Author: Ingo Steinwart, Andreas Christmann
Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1
4 0.098167583 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
5 0.077663504 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
Author: Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur
Abstract: A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of χ2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.
6 0.075349562 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison
7 0.075302757 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
8 0.074880809 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
9 0.062783927 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
10 0.060483824 175 nips-2009-Occlusive Components Analysis
11 0.055308148 95 nips-2009-Fast subtree kernels on graphs
12 0.049799412 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
13 0.045883153 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
14 0.044790149 11 nips-2009-A General Projection Property for Distribution Families
15 0.043179642 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models
16 0.043118425 206 nips-2009-Riffled Independence for Ranked Data
17 0.041296631 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
18 0.040424109 233 nips-2009-Streaming Pointwise Mutual Information
19 0.038495027 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution
20 0.038204167 3 nips-2009-AUC optimization and the two-sample problem
topicId topicWeight
[(0, -0.115), (1, 0.026), (2, -0.012), (3, -0.012), (4, 0.015), (5, -0.061), (6, -0.062), (7, -0.081), (8, -0.016), (9, -0.141), (10, 0.04), (11, -0.031), (12, -0.017), (13, -0.017), (14, 0.015), (15, -0.014), (16, -0.011), (17, 0.004), (18, 0.021), (19, 0.018), (20, -0.052), (21, 0.049), (22, -0.148), (23, -0.039), (24, -0.081), (25, 0.091), (26, -0.067), (27, 0.017), (28, -0.076), (29, 0.066), (30, -0.014), (31, -0.046), (32, -0.063), (33, -0.064), (34, 0.041), (35, 0.055), (36, 0.036), (37, 0.085), (38, -0.047), (39, -0.087), (40, 0.155), (41, 0.04), (42, -0.082), (43, 0.065), (44, -0.113), (45, -0.039), (46, 0.064), (47, -0.144), (48, -0.058), (49, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.97882909 78 nips-2009-Efficient Moments-based Permutation Tests
Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang
Abstract: In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. In permutation tests, except exchangeability, no other statistical assumptions are required. The p-values can be obtained by using the permutation distribution. Permutation tests are appealing in many biomedical studies, which often have limited observations with unknown distribution. They have been used successfully in structural MR image analysis [1, 2, 3], in functional MR image analysis [4], and in 3D face analysis [5]. There are three common approaches to construct the permutation distribution [6, 7, 8]: (1) exact permutation enumerating all possible arrangements; (2) approximate permutation based on random sampling from all possible permutations; (3) approximate permutation using the analytical moments of the exact permutation distribution under the null hypothesis. The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. The second technique often gives inflated type I errors caused by random sampling. When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. In most applications, it is not the existence but the derivation of moments that limits the third approach. To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. Then we can obtain the moments by summing up several subtotals. The proposed method can be extended to equivalent weighted v-statistics by replacing them with monotonic weighted v-statistics. This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. The p-values are then obtained without conducting any real permutation. For multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, the number of tests m = 2,000, we need to conduct m×(n1 + n2 )!/n1 !/n2 ! 1.1×1015 permutations for the exact permutation test. Even for 20,000 random permutations per test, we still need m×20,000 4×107 permutations. Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. The detailed discussion about equivalent test statistics can be found in [7, 8, 10]. 2.1 C o m p ut a t i o n a l c h a l l e n g e Let us first look at a toy example. Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. For comparison of the two groups, the hypothesis is typically constructed as: H 0 : m A = mB vs. H a : m A ¹ m B , where m A , m B are the population means of the groups A n1 n i =1 i = n1 +1 and B, respectively. Define x A = å xi / n1 and xB = å xi / n2 as the sample means of two groups, where n=n1+n2. We choose the univariate group mean difference statistic, i.e., n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . Then the total number of all possible permutations of {1, L, n} is n!. To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! p ÎS n i =1 n ! p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. Ep (T 4 ( x ))= 2.2 P a r t i t i o n t h e i n de x s e t In this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of n n i1 =1 id =1 degree d [9], that is, T ( x) = å L å w(i1 , L , id ) h( xi1 ,L , xid ) , where x = ( x1 , x 2 , L , x n ) T is a data with n observations, and w is a symmetric index function. h is a symmetric data function, i.e., invariant under permutation of (i1 ,L , id ) . Though the symmetry property is not required for our method, it helps reduce the computational cost. Here, each observation xk can be either univariate or multivariate. In the above toy example, d=1 and h is the identity function. Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Since all possible permutations map any index value between 1 and n to all possible index r values from 1 to n with equal probability, Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) , the summation of k =1 1 d data function over all permutations is only related to the equal/unequal relationship among indices. It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. Definition 1. Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) = (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). Definition 2. Two indices {(i1(1) , L , id (1) ), L , (i1( r ) , L , id ( r ) )} and {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} are said to be permutation equivalent/ if there exists a permutation p Î Sn such that {(p (i1(1) ), L , p (id (1) )), L , (p (i1( r ) ), L , p (id ( r ) ))} = {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} . Here
2 0.60295635 206 nips-2009-Riffled Independence for Ranked Data
Author: Jonathan Huang, Carlos Guestrin
Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1
3 0.57522029 7 nips-2009-A Data-Driven Approach to Modeling Choice
Author: Vivek Farias, Srikanth Jagabathula, Devavrat Shah
Abstract: We visit the following fundamental problem: For a ‘generic’ model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal preference information), how may one predict revenues from offering a particular assortment of choices? This problem is central to areas within operations research, marketing and econometrics. We present a framework to answer such questions and design a number of tractable algorithms (from a data and computational standpoint) for the same. 1
4 0.49303308 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
Author: Peter Orbanz
Abstract: We consider the general problem of constructing nonparametric Bayesian models on infinite-dimensional random objects, such as functions, infinite graphs or infinite permutations. The problem has generated much interest in machine learning, where it is treated heuristically, but has not been studied in full generality in nonparametric Bayesian statistics, which tends to focus on models over probability distributions. Our approach applies a standard tool of stochastic process theory, the construction of stochastic processes from their finite-dimensional marginal distributions. The main contribution of the paper is a generalization of the classic Kolmogorov extension theorem to conditional probabilities. This extension allows a rigorous construction of nonparametric Bayesian models from systems of finitedimensional, parametric Bayes equations. Using this approach, we show (i) how existence of a conjugate posterior for the nonparametric model can be guaranteed by choosing conjugate finite-dimensional models in the construction, (ii) how the mapping to the posterior parameters of the nonparametric model can be explicitly determined, and (iii) that the construction of conjugate models in essence requires the finite-dimensional models to be in the exponential family. As an application of our constructive framework, we derive a model on infinite permutations, the nonparametric Bayesian analogue of a model recently proposed for the analysis of rank data. 1
5 0.48605755 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
6 0.43161327 94 nips-2009-Fast Learning from Non-i.i.d. Observations
7 0.38079661 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
8 0.38066426 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
9 0.35820377 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison
10 0.34675261 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
11 0.32820451 11 nips-2009-A General Projection Property for Distribution Families
12 0.32099697 69 nips-2009-Discrete MDL Predicts in Total Variation
13 0.31745133 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models
14 0.29479852 226 nips-2009-Spatial Normalized Gamma Processes
15 0.28607568 176 nips-2009-On Invariance in Hierarchical Models
16 0.28447938 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
17 0.27872825 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem
18 0.26634589 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
19 0.26626265 138 nips-2009-Learning with Compressible Priors
20 0.26391721 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
topicId topicWeight
[(24, 0.095), (25, 0.071), (35, 0.052), (36, 0.075), (39, 0.057), (53, 0.259), (58, 0.113), (61, 0.015), (71, 0.071), (86, 0.071), (91, 0.011)]
simIndex simValue paperId paperTitle
1 0.87478256 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
same-paper 2 0.8123219 78 nips-2009-Efficient Moments-based Permutation Tests
Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang
Abstract: In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. In permutation tests, except exchangeability, no other statistical assumptions are required. The p-values can be obtained by using the permutation distribution. Permutation tests are appealing in many biomedical studies, which often have limited observations with unknown distribution. They have been used successfully in structural MR image analysis [1, 2, 3], in functional MR image analysis [4], and in 3D face analysis [5]. There are three common approaches to construct the permutation distribution [6, 7, 8]: (1) exact permutation enumerating all possible arrangements; (2) approximate permutation based on random sampling from all possible permutations; (3) approximate permutation using the analytical moments of the exact permutation distribution under the null hypothesis. The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. The second technique often gives inflated type I errors caused by random sampling. When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. In most applications, it is not the existence but the derivation of moments that limits the third approach. To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. Then we can obtain the moments by summing up several subtotals. The proposed method can be extended to equivalent weighted v-statistics by replacing them with monotonic weighted v-statistics. This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. The p-values are then obtained without conducting any real permutation. For multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, the number of tests m = 2,000, we need to conduct m×(n1 + n2 )!/n1 !/n2 ! 1.1×1015 permutations for the exact permutation test. Even for 20,000 random permutations per test, we still need m×20,000 4×107 permutations. Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. The detailed discussion about equivalent test statistics can be found in [7, 8, 10]. 2.1 C o m p ut a t i o n a l c h a l l e n g e Let us first look at a toy example. Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. For comparison of the two groups, the hypothesis is typically constructed as: H 0 : m A = mB vs. H a : m A ¹ m B , where m A , m B are the population means of the groups A n1 n i =1 i = n1 +1 and B, respectively. Define x A = å xi / n1 and xB = å xi / n2 as the sample means of two groups, where n=n1+n2. We choose the univariate group mean difference statistic, i.e., n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . Then the total number of all possible permutations of {1, L, n} is n!. To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! p ÎS n i =1 n ! p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. Ep (T 4 ( x ))= 2.2 P a r t i t i o n t h e i n de x s e t In this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of n n i1 =1 id =1 degree d [9], that is, T ( x) = å L å w(i1 , L , id ) h( xi1 ,L , xid ) , where x = ( x1 , x 2 , L , x n ) T is a data with n observations, and w is a symmetric index function. h is a symmetric data function, i.e., invariant under permutation of (i1 ,L , id ) . Though the symmetry property is not required for our method, it helps reduce the computational cost. Here, each observation xk can be either univariate or multivariate. In the above toy example, d=1 and h is the identity function. Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Since all possible permutations map any index value between 1 and n to all possible index r values from 1 to n with equal probability, Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) , the summation of k =1 1 d data function over all permutations is only related to the equal/unequal relationship among indices. It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. Definition 1. Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) = (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). Definition 2. Two indices {(i1(1) , L , id (1) ), L , (i1( r ) , L , id ( r ) )} and {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} are said to be permutation equivalent/ if there exists a permutation p Î Sn such that {(p (i1(1) ), L , p (id (1) )), L , (p (i1( r ) ), L , p (id ( r ) ))} = {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} . Here
3 0.75582743 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control
Author: Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári
Abstract: In this paper we introduce a multi-step linear Dyna-style planning algorithm. The key element of the multi-step linear Dyna is a multi-step linear model that enables multi-step projection of a sampled feature and multi-step planning based on the simulated multi-step transition experience. We propose two multi-step linear models. The first iterates the one-step linear model, but is generally computationally complex. The second interpolates between the one-step model and the infinite-step model (which turns out to be the LSTD solution), and can be learned efficiently online. Policy evaluation on Boyan Chain shows that multi-step linear Dyna learns a policy faster than single-step linear Dyna, and generally learns faster as the number of projection steps increases. Results on Mountain-car show that multi-step linear Dyna leads to much better online performance than single-step linear Dyna and model-free algorithms; however, the performance of multi-step linear Dyna does not always improve as the number of projection steps increases. Our results also suggest that previous attempts on extending LSTD for online control were unsuccessful because LSTD looks infinite steps into the future, and suffers from the model errors in non-stationary (control) environments.
4 0.61822492 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
Author: Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
Abstract: Despite a large literature on upper bounds on complexity of convex optimization, relatively less attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining a understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes. We also discuss implications of these results for the understanding the inherent complexity of large-scale learning and estimation problems. 1
5 0.61356384 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
Author: Peter Orbanz
Abstract: We consider the general problem of constructing nonparametric Bayesian models on infinite-dimensional random objects, such as functions, infinite graphs or infinite permutations. The problem has generated much interest in machine learning, where it is treated heuristically, but has not been studied in full generality in nonparametric Bayesian statistics, which tends to focus on models over probability distributions. Our approach applies a standard tool of stochastic process theory, the construction of stochastic processes from their finite-dimensional marginal distributions. The main contribution of the paper is a generalization of the classic Kolmogorov extension theorem to conditional probabilities. This extension allows a rigorous construction of nonparametric Bayesian models from systems of finitedimensional, parametric Bayes equations. Using this approach, we show (i) how existence of a conjugate posterior for the nonparametric model can be guaranteed by choosing conjugate finite-dimensional models in the construction, (ii) how the mapping to the posterior parameters of the nonparametric model can be explicitly determined, and (iii) that the construction of conjugate models in essence requires the finite-dimensional models to be in the exponential family. As an application of our constructive framework, we derive a model on infinite permutations, the nonparametric Bayesian analogue of a model recently proposed for the analysis of rank data. 1
6 0.61123163 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
7 0.60960233 230 nips-2009-Statistical Consistency of Top-k Ranking
8 0.60947704 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
9 0.60937172 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
10 0.60811621 16 nips-2009-A Smoothed Approximate Linear Program
11 0.60738927 147 nips-2009-Matrix Completion from Noisy Entries
12 0.60709965 55 nips-2009-Compressed Least-Squares Regression
13 0.6068542 3 nips-2009-AUC optimization and the two-sample problem
14 0.6055696 260 nips-2009-Zero-shot Learning with Semantic Output Codes
15 0.60512149 97 nips-2009-Free energy score space
16 0.60468829 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
17 0.60459822 122 nips-2009-Label Selection on Graphs
18 0.60459232 94 nips-2009-Fast Learning from Non-i.i.d. Observations
19 0.60443747 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
20 0.60402817 163 nips-2009-Neurometric function analysis of population codes