nips nips2010 nips2010-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Feature selection is an important component of many machine learning applications. [sent-7, score-0.167]
2 Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. [sent-8, score-0.56]
3 In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. [sent-9, score-0.533]
4 The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. [sent-10, score-0.331]
5 Our regression based objective makes the feature selection process more efficient. [sent-12, score-0.293]
6 Our method has been applied into both genomic and proteomic biomarkers discovery. [sent-13, score-0.229]
7 Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. [sent-14, score-0.358]
8 A large number of developments on feature selection have been made in the literature and there are many recent reviews and workshops devoted to this topic, e. [sent-17, score-0.293]
9 In past ten years, feature selection has seen much activities primarily due to the advances in bioinformatics where a large amount of genomic and proteomic data are produced for biological and biomedical studies. [sent-20, score-0.617]
10 For example, in genomics, DNA microarray data measure the expression levels of thousands of genes in a single experiment. [sent-21, score-0.273]
11 Gene expression data usually contain a large number of genes, but a small number of samples. [sent-22, score-0.075]
12 A given disease or a biological function is usually associated with a few genes [19]. [sent-23, score-0.154]
13 Out of several thousands of genes to select a few of relevant genes thus becomes a key problem in bioinformatics research [22]. [sent-24, score-0.365]
14 In proteomics, high-throughput mass spectrometry (MS) screening measures the molecular weights of individual biomolecules (such as proteins and nucleic acids) and has potential to discover putative proteomic biomarkers. [sent-25, score-0.267]
15 The identification of meaningful proteomic features from MS is crucial for disease diagnosis and protein-based biomarker profiling [22]. [sent-27, score-0.238]
16 In bioinformatics applications, many feature selection methods from these categories have been proposed and applied. [sent-29, score-0.43]
17 Widely used filter-type feature selection methods include F -statistic [4], reliefF [11, 13], mRMR [19], t-test, and information gain [21] which compute the sensitivity (correlation or relevance) of a feature with respect to (w. [sent-30, score-0.492]
18 Wrapper-type feature selection methods is tightly coupled with a specific classifier, such as correlation-based feature selection (CFS) [9], support vector machine recursive feature elimination (SVM-RFE) [8]. [sent-34, score-0.712]
19 Recently sparsity regularization in dimensionality reduction has been widely investigated and also applied into feature selection studies. [sent-36, score-0.329]
20 1 -SVM was proposed to perform feature selection using the 1 -norm regularization that tends to give sparse solution [3]. [sent-37, score-0.329]
21 Because the number of selected features using 1 -SVM is upper bounded by the sample size, a Hybrid Huberized SVM (HHSVM) was proposed combining both 1 -norm and 2 -norm to form a more structured regularization [26]. [sent-38, score-0.095]
22 [1] have developed a similar model for 2,1 -norm regularization to couple feature selection across tasks. [sent-44, score-0.329]
23 In this paper, we propose a novel efficient and robust feature selection method to employ joint 2,1 norm minimization on both loss function and regularization. [sent-46, score-0.532]
24 Motivated by previous research [1, 18], a 2,1 -norm regularization is performed to select features across all data points with joint sparsity, i. [sent-48, score-0.126]
25 each feature (gene expression or mass-to-charge value in MS) either has small scores for all data points or has large scores over all data points. [sent-50, score-0.201]
26 To solve this new robust feature selection objective, we propose an efficient algorithm to solve such joint 2,1 -norm minimization problem. [sent-51, score-0.554]
27 Extensive experiments have been performed on six bioinformatics data sets and our method outperforms five other commonly used feature selection methods in statistical learning and bioinformatics. [sent-53, score-0.495]
28 = i=1 j=1 n 2,1 m j=1 2 1 norm and also used n mi m2 = ij = i=1 (1) i=1 The 2,1 -norm of a matrix was first introduced in [5] as rotational invariant for multi-task learning [1, 18] and tensor factorization [10]. [sent-60, score-0.205]
29 The Frobenius norm of the matrix M ∈ Rn×m is defined as n M p n i=1 2 , (2) which is rotational invariant for rows: MR 2,1 -norm can be generalized to r,p -norm M r,p = M n j=1 i=1 for any rotational matrix R. [sent-62, score-0.245]
30 (3) i=1 Note that r,p -norm is a valid norm because it satisfies the three norm conditions, including the triangle inequality A r,p + B r,p ≥ A + B r,p . [sent-64, score-0.176]
31 2 WT xi + b − yi min W,b (5) i=1 For simplicity, the bias b can be absorbed into W when the constant value 1 is added as an additional dimension for each data xi (1 ≤ i ≤ n). [sent-73, score-0.073]
32 2 (6) WT xi − yi min 2 , (7) i=1 In this paper, we use the robust loss function: n min W i=1 where the residual WT xi − yi is not squared and thus outliers have less importance than the squared residual WT xi − yi 2 . [sent-75, score-0.343]
33 This loss function has a rotational invariant property while the pure 1 -norm loss function does not has such desirable property [5]. [sent-76, score-0.187]
34 The problem becomes: n WT xi − yi min W 2 + γR(W). [sent-78, score-0.073]
35 R3 (W) and R4 (W) penalizes all c regression coefficients corresponding to a single feature as a whole. [sent-82, score-0.126]
36 In this paper, we optimize n WT xi − yi min J(W) = W 2 + γR4 (W) = XT W − Y 2,1 +γ W 2,1 . [sent-87, score-0.073]
37 It was generally felt that the 2,1 -norm minimization problem is much more difficult to solve than the 1 -norm minimization problem. [sent-104, score-0.165]
38 Moreover, the algorithm is derived to solve the specific problem, and can not be applied directly to solve other general 2,1 -norm minimization problem. [sent-111, score-0.159]
39 Theoretical analysis guarantees that the proposed method will converge to the global optimum. [sent-113, score-0.076]
40 More importantly, this method is very easy to implement and can be readily used to solve other general 2,1 -norm minimization problem. [sent-114, score-0.108]
41 t U, and setting the derivative to zero, we have: ∂L(U) = 2DU − AT Λ = 0, ∂U where D is a diagonal matrix with the i-th diagonal element as1 1 dii = . [sent-120, score-0.238]
42 2 ui 2 (16) (17) Left multiplying the two sides of Eq. [sent-121, score-0.336]
43 (19) is satisfied, and prove in the next subsection that the proposed iterative algorithm will converge to the global optimum. [sent-130, score-0.107]
44 t t Calculate the diagonal matrix Dt+1 , where the i-th diagonal element is 1 2 ui t+1 . [sent-136, score-0.475]
45 until Converges Algorithm 1: An efficient iterative algorithm to solve the optimization problem in Eq. [sent-138, score-0.082]
46 For any nonzero vectors u, ut ∈ Rc , the following inequality holds: 2 u 2− u 2 ≤ ut 2 ut 2 2 2− ut 2 . [sent-144, score-0.939]
47 Beginning with an obvious inequality ( v − vt )2 ≥ 0, we have √ √ √ √ v ( v − vt )2 ≥ 0 ⇒ v − 2 vvt + vt ≥ 0 ⇒ v − √ ≤ 2 vt Substitute the v and vt in Eq. [sent-146, score-0.637]
48 (21) by u 2 2 and ut 2 2 √ √ √ vt v vt ⇒ v − √ ≤ vt − √ (21) 2 2 vt 2 vt respectively, we arrive at the Eq. [sent-147, score-0.852]
49 1 When ui = 0, then dii = 0 is a subgradient of U 2,1 w. [sent-149, score-0.435]
50 However, we can not set dii = 0 when u = 0, otherwise the derived algorithm can not be guaranteed to converge. [sent-153, score-0.099]
51 (19) that we only need to calculate D−1 , so we can let the i-th element of D−1 as 2 ui 2 . [sent-156, score-0.365]
52 Second, we can regularize dii as dii = √ i1T i , and the derived algorithm can be i 2 proved to minimize the regularized 2,1 -norms of U. [sent-157, score-0.227]
53 It is easy to see that the regularized of U (defined as 2,1 -norms (u ) u +ς n (ui )T ui i=1 of U approximates the 5 + ς) instead of the 2,1 -norms 2,1 -norms of U when ς → 0. [sent-158, score-0.336]
54 (14) in each iteration, and converge to the global optimum of the problem. [sent-161, score-0.109]
55 t+1 t (24) U AU=Y which indicates that That is to say, ui t+1 2 2 2 ui t 2 m i=1 m ≤ i=1 2 2 ui t 2 ui t , (25) 2 where vectors ui and ui denote the i-th row of matrices Ut and Ut+1 , respectively. [sent-167, score-2.016]
56 t t+1 On the other hand, according to Lemma 1, for each i we have ui t+1 − 2 ui t+1 2 2 2 ui t 2 ≤ ui t − 2 ui t 2 ui t 2 2 . [sent-168, score-2.016]
57 (26) 2 Thus the following inequality holds: ui t+1 2 2 2 ui t 2 m ui t+1 − 2 i=1 m ui t ≤ − 2 i=1 ui t 2 ui t 2 2 . [sent-169, score-2.063]
58 (27), we arrive at m m ui t+1 2 ui t ≤ i=1 2 . [sent-172, score-0.711]
59 Therefore, the Algorithm 1 will converge to the global optimum of the problem (14). [sent-182, score-0.109]
60 First, D is a diagonal matrix and thus D−1 is also diagonal with the i-th diagonal element as d−1 = 2 ui 2 . [sent-185, score-0.53]
61 It is worth to point out that the proposed method can be easily extended to solve other 2,1 -norm minimization problem. [sent-190, score-0.108]
62 For example, considering a general 2,1 -norm minimization problem as follows: min f (U) + Ak U + Bk 2,1 s. [sent-191, score-0.093]
63 U∈C (31) U k The problem can be solved by solve the following problem iteratively: T r((Ak U + Bk )T Dk (Ak U + Bk )) min f (U) + U s. [sent-193, score-0.087]
64 U∈C (32) k 1 where Dk is a diagonal matrix with the i-th diagonal element as 2 (Ak U+Bk )i . [sent-195, score-0.139]
65 , f (U) is a convex function and C is a convex set, then the iterative method will converge to the global minimum. [sent-200, score-0.107]
66 5 Experimental Results In order to validate the performance of our feature selection method, we applied our method into two bioinformatics applications, gene expression and mass spectrometry classifications. [sent-204, score-0.724]
67 ALLAML data set contains in total 72 samples in two classes, ALL and AML, which contain 47 and 25 samples, respectively. [sent-209, score-0.084]
68 GLIOMA data set contains in total 50 samples in four classes, cancer glioblastomas (CG), noncancer glioblastomas (NG), cancer oligodendrogliomas (CO) and non-cancer oligodendrogliomas (NO), which have 14, 14, 7,15 samples, respectively. [sent-211, score-0.728]
69 Genes with minimal variations across the samples were removed. [sent-213, score-0.084]
70 Genes whose expression levels varied < 100 units between samples, or varied < 3 fold between any two samples, were excluded. [sent-215, score-0.075]
71 After preprocessing, we obtained a data set with 50 samples and 4433 genes. [sent-216, score-0.084]
72 LUNG data set contains in total 203 samples in five classes, which have 139, 21, 20, 6,17 samples, respectively. [sent-217, score-0.084]
73 The genes with standard deviations smaller than 50 expression units were removed and we obtained a data set with 203 samples and 3312 genes. [sent-219, score-0.273]
74 Carcinomas data set composed of total 174 samples in eleven classes, prostate, bladder/ureter, breast, colorectal, gastroesophagus, kidney, liver, ovary, pancreas, lung adenocarcinomas, and lung squamous cell carcinoma, which have 26, 8, 26, 23, 12, 11, 7, 27, 6, 14, 14 samples, respectively. [sent-220, score-0.506]
75 In the preprocessed data set [27], there are 174 samples and 9182 genes. [sent-222, score-0.084]
76 Average accuracy of top 20 features (%) RF F-s T-test IG mRMR ALLAML 90. [sent-226, score-0.107]
77 48 Average accuracy of top 80 features (%) RF F-s T-test IG mRMR 95. [sent-263, score-0.107]
78 02 Prostate-GE data set has in total 102 samples in two classes tumor and normal, which have 52 and 50 samples, respectively. [sent-297, score-0.084]
79 Then we filtered out the genes with max/min ≤ 5 or (max-min) ≤ 50. [sent-300, score-0.114]
80 After preprocessing, we obtained a data set with 102 samples and 5966 genes. [sent-301, score-0.084]
81 This MS data set consists of 190 samples diagnosed as benign prostate hyperplasia, 63 samples considered as no evidence of disease, and 69 samples diagnosed as prostate cancer. [sent-303, score-0.862]
82 The samples diagnosed as benign prostate hyperplasia as well as samples having no evidence of prostate cancer were pooled into one set making 253 control samples, whereas the other 69 samples are the cancer samples. [sent-304, score-1.268]
83 We compare our feature selection method (called as RFS) to several popularly used feature selection methods in bioinformatics, such as F -statistic [4], reliefF [11, 13], mRMR [19], t-test, and information gain [21]. [sent-309, score-0.659]
84 1 shows the classification accuracy comparisons of all five feature selection methods on six data sets. [sent-312, score-0.406]
85 We compute the average accuracy using the top 20 and top 80 features for all feature selection approaches. [sent-314, score-0.4]
86 6 Conclusions In this paper, we proposed a new efficient and robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. [sent-317, score-0.533]
87 The 2,1 -norm based regression loss function is robust to outliers in data points and also efficient in calculation. [sent-318, score-0.16]
88 Motivated by previous work, the 2,1 -norm regularization is used to select features across all data points with joint sparsity. [sent-319, score-0.126]
89 Our method has been applied into both genomic and proteomic biomarkers discovery. [sent-321, score-0.229]
90 Extensive empirical studies have been performed on two bioinformatics tasks, six data sets, to demonstrate the performance of our method. [sent-322, score-0.202]
91 Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. [sent-335, score-0.47]
92 Feature selection via concave minimization and support vector machines. [sent-340, score-0.224]
93 Minimum redundancy feature selection from microarray gene expression data. [sent-345, score-0.573]
94 R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization. [sent-352, score-0.105]
95 Gene selection for cancer classification using support vector machines. [sent-372, score-0.383]
96 Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. [sent-379, score-0.167]
97 Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. [sent-432, score-0.082]
98 Feature selection based on mutual information: Criteria of max-depe ndency, max-relevance, and min-redundancy. [sent-445, score-0.167]
99 Molecular classification of human carcinomas by use of gene expression signatures. [sent-479, score-0.38]
100 Model selection and estimation in regression with grouped variables. [sent-504, score-0.167]
wordName wordTfidf (topN-words)
[('ui', 0.336), ('mrmr', 0.263), ('relieff', 0.237), ('rfs', 0.233), ('ut', 0.223), ('cancer', 0.216), ('lung', 0.211), ('prostate', 0.208), ('carcinomas', 0.184), ('selection', 0.167), ('fscorerank', 0.158), ('proteomic', 0.139), ('bioinformatics', 0.137), ('glioma', 0.132), ('feature', 0.126), ('gene', 0.121), ('vt', 0.118), ('genes', 0.114), ('allaml', 0.105), ('dii', 0.099), ('ad', 0.099), ('au', 0.086), ('microarray', 0.084), ('samples', 0.084), ('rotational', 0.083), ('classification', 0.082), ('arlington', 0.08), ('diagnosed', 0.079), ('ding', 0.075), ('expression', 0.075), ('gain', 0.073), ('robust', 0.071), ('spectrometry', 0.069), ('wt', 0.067), ('ms', 0.066), ('six', 0.065), ('texas', 0.061), ('features', 0.059), ('classi', 0.058), ('minimization', 0.057), ('ig', 0.057), ('diagonal', 0.055), ('outliers', 0.054), ('glioblastomas', 0.053), ('heng', 0.053), ('hhsvm', 0.053), ('huberized', 0.053), ('hyperplasia', 0.053), ('malignant', 0.053), ('oligodendrogliomas', 0.053), ('proteomics', 0.053), ('dt', 0.052), ('solve', 0.051), ('bk', 0.05), ('accuracy', 0.048), ('rf', 0.048), ('genomic', 0.048), ('ak', 0.048), ('rc', 0.047), ('inequality', 0.047), ('emphasizing', 0.046), ('norm', 0.045), ('mi', 0.043), ('vi', 0.043), ('biomarkers', 0.042), ('rn', 0.042), ('disease', 0.04), ('converge', 0.04), ('socp', 0.04), ('boldface', 0.04), ('ling', 0.04), ('triangle', 0.039), ('arrive', 0.039), ('xt', 0.038), ('guyon', 0.038), ('yi', 0.037), ('regularization', 0.036), ('benign', 0.036), ('min', 0.036), ('global', 0.036), ('bi', 0.035), ('loss', 0.035), ('invariant', 0.034), ('mij', 0.033), ('svm', 0.033), ('optimum', 0.033), ('monotonically', 0.033), ('iterative', 0.031), ('argyriou', 0.031), ('substitute', 0.031), ('dna', 0.031), ('joint', 0.031), ('extensive', 0.031), ('molecular', 0.03), ('ai', 0.03), ('sdp', 0.03), ('mass', 0.029), ('element', 0.029), ('proved', 0.029), ('correlates', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
2 0.24572971 217 nips-2010-Probabilistic Multi-Task Feature Selection
Author: Yu Zhang, Dit-Yan Yeung, Qian Xu
Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1
3 0.12280247 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
Author: Jun Liu, Jieping Ye
Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.
4 0.11601491 91 nips-2010-Fast detection of multiple change-points shared by many signals using group LARS
Author: Jean-philippe Vert, Kevin Bleakley
Abstract: We present a fast algorithm for the detection of multiple change-points when each is frequently shared by members of a set of co-occurring one-dimensional signals. We give conditions on consistency of the method when the number of signals increases, and provide empirical evidence to support the consistency results. 1
5 0.096655697 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
Author: Ziv Bar-joseph, Hai-son P. Le
Abstract: Recent studies compare gene expression data across species to identify core and species specific genes in biological systems. To perform such comparisons researchers need to match genes across species. This is a challenging task since the correct matches (orthologs) are not known for most genes. Previous work in this area used deterministic matchings or reduced multidimensional expression data to binary representation. Here we develop a new method that can utilize soft matches (given as priors) to infer both, unique and similar expression patterns across species and a matching for the genes in both species. Our method uses a Dirichlet process mixture model which includes a latent data matching variable. We present learning and inference algorithms based on variational methods for this model. Applying our method to immune response data we show that it can accurately identify common and unique response patterns by improving the matchings between human and mouse genes. 1
6 0.089147262 265 nips-2010-The LASSO risk: asymptotic results and real world examples
7 0.082516387 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
8 0.081371762 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
9 0.076485313 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
10 0.07572262 102 nips-2010-Generalized roof duality and bisubmodular functions
11 0.070562862 235 nips-2010-Self-Paced Learning for Latent Variable Models
12 0.06761191 94 nips-2010-Feature Set Embedding for Incomplete Data
13 0.06698031 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
14 0.064060725 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
15 0.062848397 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
16 0.06211577 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
17 0.061579984 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
18 0.057975572 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
19 0.057554469 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
20 0.056594454 231 nips-2010-Robust PCA via Outlier Pursuit
topicId topicWeight
[(0, 0.206), (1, 0.022), (2, 0.053), (3, 0.021), (4, 0.05), (5, -0.073), (6, -0.031), (7, 0.059), (8, -0.076), (9, -0.03), (10, 0.013), (11, 0.095), (12, 0.035), (13, 0.074), (14, 0.073), (15, 0.022), (16, -0.037), (17, 0.044), (18, 0.047), (19, -0.041), (20, -0.069), (21, 0.039), (22, 0.023), (23, 0.075), (24, -0.034), (25, 0.079), (26, 0.076), (27, -0.034), (28, -0.002), (29, 0.073), (30, -0.095), (31, 0.024), (32, -0.037), (33, -0.118), (34, 0.148), (35, -0.084), (36, 0.006), (37, -0.001), (38, 0.068), (39, 0.083), (40, -0.066), (41, -0.024), (42, 0.081), (43, -0.2), (44, -0.162), (45, -0.047), (46, -0.03), (47, -0.178), (48, -0.109), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.92479146 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
2 0.77047157 217 nips-2010-Probabilistic Multi-Task Feature Selection
Author: Yu Zhang, Dit-Yan Yeung, Qian Xu
Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1
3 0.56305861 91 nips-2010-Fast detection of multiple change-points shared by many signals using group LARS
Author: Jean-philippe Vert, Kevin Bleakley
Abstract: We present a fast algorithm for the detection of multiple change-points when each is frequently shared by members of a set of co-occurring one-dimensional signals. We give conditions on consistency of the method when the number of signals increases, and provide empirical evidence to support the consistency results. 1
4 0.49188694 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
5 0.45320028 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression
Author: Ling Huang, Jinzhu Jia, Bin Yu, Byung-gon Chun, Petros Maniatis, Mayur Naik
Abstract: Predicting the execution time of computer programs is an important but challenging problem in the community of computer systems. Existing methods require experts to perform detailed analysis of program code in order to construct predictors or select important features. We recently developed a new system to automatically extract a large number of features from program execution on sample inputs, on which prediction models can be constructed without expert knowledge. In this paper we study the construction of predictive models for this problem. We propose the SPORE (Sparse POlynomial REgression) methodology to build accurate prediction models of program performance using feature data collected from program execution on sample inputs. Our two SPORE algorithms are able to build relationships between responses (e.g., the execution time of a computer program) and features, and select a few from hundreds of the retrieved features to construct an explicitly sparse and non-linear model to predict the response variable. The compact and explicitly polynomial form of the estimated model could reveal important insights into the computer program (e.g., features and their non-linear combinations that dominate the execution time), enabling a better understanding of the program’s behavior. Our evaluation on three widely used computer programs shows that SPORE methods can give accurate prediction with relative error less than 7% by using a moderate number of training data samples. In addition, we compare SPORE algorithms to state-of-the-art sparse regression algorithms, and show that SPORE methods, motivated by real applications, outperform the other methods in terms of both interpretability and prediction accuracy.
7 0.43934122 45 nips-2010-CUR from a Sparse Optimization Viewpoint
8 0.42465109 139 nips-2010-Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development
9 0.41901949 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
10 0.41473496 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
11 0.41343158 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
12 0.41288549 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
13 0.40643534 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
14 0.40576234 158 nips-2010-Learning via Gaussian Herding
15 0.40520135 5 nips-2010-A Dirty Model for Multi-task Learning
16 0.40485731 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection
17 0.39913061 94 nips-2010-Feature Set Embedding for Incomplete Data
18 0.39790538 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
19 0.3952046 172 nips-2010-Multi-Stage Dantzig Selector
20 0.39374366 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
topicId topicWeight
[(13, 0.038), (27, 0.051), (30, 0.04), (35, 0.443), (45, 0.183), (50, 0.042), (52, 0.026), (60, 0.033), (71, 0.021), (77, 0.026), (90, 0.027)]
simIndex simValue paperId paperTitle
1 0.84919739 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
Author: Jun Liu, Jieping Ye
Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.
2 0.7678178 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas
Author: Georg Langs, Yanmei Tie, Laura Rigolo, Alexandra Golby, Polina Golland
Abstract: Matching functional brain regions across individuals is a challenging task, largely due to the variability in their location and extent. It is particularly difficult, but highly relevant, for patients with pathologies such as brain tumors, which can cause substantial reorganization of functional systems. In such cases spatial registration based on anatomical data is only of limited value if the goal is to establish correspondences of functional areas among different individuals, or to localize potentially displaced active regions. Rather than rely on spatial alignment, we propose to perform registration in an alternative space whose geometry is governed by the functional interaction patterns in the brain. We first embed each brain into a functional map that reflects connectivity patterns during a fMRI experiment. The resulting functional maps are then registered, and the obtained correspondences are propagated back to the two brains. In application to a language fMRI experiment, our preliminary results suggest that the proposed method yields improved functional correspondences across subjects. This advantage is pronounced for subjects with tumors that affect the language areas and thus cause spatial reorganization of the functional regions. 1
3 0.76074469 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
same-paper 4 0.74623531 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
5 0.64988643 149 nips-2010-Learning To Count Objects in Images
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
6 0.6050992 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
7 0.58003366 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
8 0.56554967 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
9 0.54944938 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
10 0.53785384 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
11 0.53319192 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
12 0.53213435 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
13 0.53090876 181 nips-2010-Network Flow Algorithms for Structured Sparsity
14 0.53079689 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
15 0.53029084 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
16 0.52496016 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
17 0.52361417 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
18 0.52099711 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
19 0.51629114 5 nips-2010-A Dirty Model for Multi-task Learning
20 0.51600182 217 nips-2010-Probabilistic Multi-Task Feature Selection