nips nips2010 nips2010-41 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We consider multivariate regression problems involving high-dimensional predictor and response spaces. [sent-6, score-0.198]
2 To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. [sent-7, score-0.127]
3 This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. [sent-8, score-0.166]
4 Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. [sent-9, score-0.296]
5 When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. [sent-10, score-0.158]
6 1 Introduction The broad goal of supervised learning is to effectively learn unknown functional dependencies between a set of input variables and a set of output variables, given a finite collection of training examples. [sent-12, score-0.168]
7 The first topic is Multivariate Regression [4, 2, 24] which generalizes basic single-output regression to settings involving multiple output variables with potentially significant correlations between them. [sent-14, score-0.248]
8 Applications of multivariate regression models include chemometrics, econometrics and computational biology. [sent-15, score-0.198]
9 These techniques are output-centric in the sense that they attempt to exploit dependencies between output variables to learn joint models that generalize better than those that treat outputs independently. [sent-17, score-0.137]
10 The second topic is that of sparsity [3], variable selection and the broader notion of regularization [20]. [sent-18, score-0.23]
11 , via dimensionality reduction, input variable selection or regularized risk minimization. [sent-23, score-0.158]
12 , Lasso [19]) and matching pursuit techniques [13] which come with theoretical guarantees on the recovery of the exact support under certain conditions. [sent-27, score-0.161]
13 Particularly pertinent to this paper is the 1 notion of group sparsity. [sent-28, score-0.152]
14 For instance, Group Lasso [23] extends Lasso, and Group-OMP [12, 9] extends matching pursuit techniques to this setting. [sent-30, score-0.161]
15 We address the problem of enforcing sparsity via variable selection in multivariate linear models where regularization becomes crucial since the number of parameters grows not only with the data dimensionality but also the number of outputs. [sent-32, score-0.286]
16 Our approach is guided by the following desiderata: (a) performing variable selection for each output in isolation may be highly suboptimal since the input variables which are relevant to (a subset of) the outputs may only exhibit weak correlation with each individual output. [sent-33, score-0.295]
17 It is also desirable to leverage information on the relatedness between outputs, so as to guide the decision on the relevance of a certain input variable to a certain output, using additional evidence based on the relevance to related outputs. [sent-34, score-0.119]
18 (b) It is desirable to take into account any grouping structure that may exist between input and output variables. [sent-35, score-0.156]
19 To efficiently satisfy the above desiderata, we propose Multivariate Group Orthogonal Matching Pursuit (MGOMP) for enforcing arbitrary block sparsity patterns in multivariate regression coefficients. [sent-37, score-0.296]
20 These patterns are specified by groups defined over both input and output variables. [sent-38, score-0.212]
21 In particular, MGOMP can handle cases where the set of relevant features may differ from one response (group) to another, and is thus more general than simultaneous variable selection procedures (e. [sent-39, score-0.127]
22 S-OMP of [21]), as simultaneity of the selection in MGOMP is enforced within groups of related output variables rather than the entire set of outputs. [sent-41, score-0.306]
23 MGOMP also generalizes the GroupOMP algorithm of [12] to the multivariate regression case. [sent-42, score-0.198]
24 We provide theoretical guarantees on the quality of the model in terms of correctness of group variable selection and regression coefficient estimation. [sent-43, score-0.294]
25 We then focus on applying MGOMP to high-dimensional multivariate time series analysis problems. [sent-45, score-0.156]
26 Specifically, we propose a novel application of multivariate regression methods with variable selection, namely that of inferring key influencers in online social communities, a problem of increasing importance with the rise of planetary scale web 2. [sent-46, score-0.343]
27 0 platforms such as Facebook, Twitter, and innumerable discussion forums and blog sites. [sent-47, score-0.121]
28 We rigorously map this problem to that of inferring causal influence relationships. [sent-48, score-0.259]
29 Using special cases of MGOMP, we extend the classical notion of Granger Causality [7] which provides an operational notion of causality in time series analysis, to apply to a collection of multivariate time series variables representing the evolving textual content of a community of bloggers. [sent-49, score-0.481]
30 The sparsity structure of the resulting model induces a weighted causal graph that encodes influence relationships. [sent-50, score-0.323]
31 While we use blog communities to concretize the application of our models, our ideas hold more generally to a wider class of spatio temporal causal modeling problems. [sent-51, score-0.375]
32 Among those, variable selection methods are most popular as they lead to parsimonious and interpretable models, which is desirable in many applications. [sent-62, score-0.127]
33 Clearly, however, variable selection in multiple output regression is particularly challenging in the presence of high dimensional feature vectors as well as possibly a large number of responses. [sent-63, score-0.301]
34 In many applications, including high-dimensional time series analysis and causal modeling settings showcased later in this paper, it is possible to provide domain specific guidance for variable selection by imposing a sparsity structure on A. [sent-64, score-0.431]
35 IL } denote the set formed by L (possibly overlapping) groups of input variables where Ik ⊂ {1 . [sent-68, score-0.224]
36 OM } denote the set formed by M (possibly overlapping) groups of output variables where Ok ⊂ {1 . [sent-78, score-0.288]
37 Note that if certain variables do not belong to any group, they may be considered to be groups of size 1. [sent-85, score-0.128]
38 These group definitions specify a block sparsity/support pattern on A. [sent-86, score-0.146]
39 1 Multivariate Group Orthogonal Matching Pursuit The MGOMP procedure performs greedy pursuit with respect to the loss function LC (A) = tr (Y − XA)T (Y − XA)C , (2) where C is an estimate of the precision matrix Σ−1 , given as input. [sent-90, score-0.228]
40 Existing variable selection methods often ignore this information and deal instead with (regularized versions of) the simplified objective tr (Y − XA)T (Y − XA) , thereby implicitly assuming that Σ = I. [sent-93, score-0.164]
41 , K}, denote by CO the restriction of the K × K precision matrix C to columns corresponding to the output variables in O, and by CO,O similar restriction to both columns and rows. [sent-98, score-0.212]
42 Furthermore, to simplify the exposition, we assume in the remainder of the paper that for each group of input variables Is ∈ I, XIs is orthonormalized, i. [sent-103, score-0.161]
43 Denote by A(m) the estimate of the regression coefficient matrix at iteration m, and by R(m) the corresponding matrix of residuals, i. [sent-106, score-0.157]
44 1 We note that we could easily generalize this setting and MGOMP to deal with the more general case where there may be a different grouping structure for each output group, namely for each Ok , we could consider a different set IOk of input variable groups. [sent-113, score-0.233]
45 Using standard Linear Algebra, the block variable selection criteria simplifies to −1 (r(m) , s(m) ) = arg max tr (XTr R(m−1) COs )T (XTr R(m−1) COs )(COs ,Os ) . [sent-115, score-0.222]
46 I I (3) r,s From the above equation, it is clear that the relatedness between output variables is taken into account in the block selection process. [sent-116, score-0.322]
47 However, a closed form solution for (4) can be derived by recalling the following matrix identities [8], tr(MT M2 M3 MT ) = 1 4 vec(M1 M2 ) = vec(M1 )T (M4 ⊗ M2 )vec(M3 ), (I ⊗ M1 )vec(M2 ), (5) (6) where vec denotes the matrix vectorization, ⊗ the Kronecker product, and I the identity matrix. [sent-123, score-0.233]
48 (7) For a set of selected blocks, say M, denote by O(M) the union of the output groups in M. [sent-125, score-0.181]
49 For each output group Os in M, let I(Os ) = ˜ ˜ ∪(Ir ,Os )∈M Ir . [sent-127, score-0.183]
50 2 Theoretical Performance Guarantees for MGOMP In this section we show that under certain conditions MGOMP can identify the correct blocks of variables and provide an upperbound on the maximum absolute difference between the estimated and true regression coefficients. [sent-134, score-0.148]
51 We assume that the estimate of the error precision matrix, C, is in agreement with the specification of the output groups, namely that Ci,j = 0 if i and j belong to different output groups. [sent-135, score-0.259]
52 For each output variable group Ok , denote by Ggood (k) the set formed by the input groups included in the true model for the regressions in Ok , and let Gbad (k) be the set formed by all the pairs that are not included. [sent-136, score-0.534]
53 Similarly denote by Mgood the set formed by the pairs of input and output variable groups included in the true model, and Mbad be the set formed by all the pairs that are not included. [sent-137, score-0.386]
54 ,M} inf α Xα 2 / α 2 : supp(α) ⊆ Ggood (k) , 2 2 namely ρX (Mgood ) is the minimum over the output groups Ok of the smallest eigenvalue of XTgood (k) XGgood (k) . [sent-142, score-0.214]
55 G For each output group Ok , define generally for any u = {u1 , . [sent-143, score-0.183]
56 The multivariate regression model Y = XA + E can be rewritten in an equiva¯ ˜ ˜ lent univariate form with white noise: Y = (IK ⊗ X)¯ + η, where α = vec(A), Y = α ¯ K diag 1 1/2 Ck,k vec(YC1/2 ), and η is formed by i. [sent-163, score-0.359]
57 We can see that In k=1 applying the MGOMP procedure is equivalent to applying the Group-OMP procedure [12] to the above vectorized regression model, using as grouping structure that naturally induced by the inputoutput groups originally considered for MGOMP. [sent-166, score-0.195]
58 The theorem then follows from Theorem 3 in [12] and translating the univariate conditions for consistency into their multivariate counterparts via µX (Mgood ) and ρX (Mgood ). [sent-167, score-0.215]
59 Since C is such that Ci,j = 0 for any i, j belonging to distinct ˜ groups, the entries in Y do not mix components of Y from different output groups and hence the error covariance matrix does not appear in the consistency conditions. [sent-168, score-0.22]
60 3 Simulation Results We empirically evaluate the performance of our method against representative variable selection methods, in terms of accuracy of prediction and variable (group) selection. [sent-175, score-0.171]
61 As a measure of variable 2P R selection accuracy we use the F1 measure, which is defined as F1 = P +R , where P denotes the precision and R denotes the recall. [sent-176, score-0.163]
62 To compute the variable group F1 of a variable selection method, we consider a group to be selected if any of the variables in the group is selected. [sent-177, score-0.477]
63 For univariate methods, we consider individual selection of the iteration number for each univariate regression (joint selection of a common iteration number across the univariate regressions led to worse results in the setting considered). [sent-181, score-0.593]
64 Hence we consider 60 variables grouped into 20 groups corresponding the the 3rd degree polynomial expansion. [sent-213, score-0.128]
65 A dictionary of various matching pursuit methods and their corresponding parameters is provided in Table 1. [sent-307, score-0.161]
66 In the table, note that MGOMP(Parallel) consists in running MGOMP separately for each regression group and C set to identity (Using C estimated from univariate OMP fits has negligible impact on performance and hence is omitted for conciseness. [sent-308, score-0.292]
67 Overall, in all the settings considered, MGOMP is superior both in terms of prediction and variable selection accuracy, and more so when the correlation between responses increases. [sent-311, score-0.156]
68 In addition, model selection appears to be more robust for MGOMP, which has only one stopping point (MGOMP has one path interleaving input variables for various regressions, while GOMP and OMP have K paths, one path per univariate regression). [sent-314, score-0.252]
69 Social media platforms such as blogs, twitter accounts and online discussion sites are large-scale forums where every individual can voice a potentially influential public opinion. [sent-319, score-0.145]
70 Which blogs or twitter accounts should a consumer follow in order to get a gist of the community opinion as a whole? [sent-321, score-0.182]
71 How can a company identify bloggers whose commentary can change brand perceptions across this universe, so that marketing interventions can be effectively strategized? [sent-322, score-0.132]
72 The problem of finding key influencers and authorities in online communities is central to any viable information triage solution, and is therefore attracting increasing attention [14, 6]. [sent-323, score-0.127]
73 However, the mechanics of opinion exchange and adoption makes the problem of inferring authority and influence in social media settings somewhat different from the problem of ranking generic web-pages. [sent-326, score-0.209]
74 She initiates a web search for the laptop model and browses several discussion and blog sites where that model has been reviewed. [sent-329, score-0.154]
75 Moreover, the temporal order of events in this interaction is indicative of the direction of causal influence. [sent-334, score-0.227]
76 Introduced by the Nobel prize winning economist, Clive Granger, this notion has proven useful as an operational notion of causality in time series analysis. [sent-337, score-0.184]
77 Specifically, given a k,t dictionary of K words and the time-stamp of each blog post, we record wi , the frequency of the kth word for blogger Bi at time t. [sent-344, score-0.255]
78 Then, the content of blogger Bi at time t can be represented as 1,t K,t Bt = [wi , . [sent-345, score-0.234]
79 The input to our model is a collection of multivariate time series, {Bt }T i i t=1 (1 ≤ i ≤ G), where T is the timespan of our analysis. [sent-349, score-0.15]
80 Our key intuition is that authorities and influencers are causal drivers of future discussions and opinions in the community. [sent-350, score-0.294]
81 The influence problem can thus be mapped to a variable group selection problem in a vector autoregressive model, i. [sent-352, score-0.215]
82 , in multivariate regression with G × K responses {Bt , j = 1, 2 . [sent-354, score-0.227]
83 G} in terms of variable groups {Bt−l }d , j = 1, 2 . [sent-357, score-0.13]
84 We can then conclude that a certain blogger Bi 1 1 G G influences blogger Bj , if the variable group {Bt−l }l∈{1,. [sent-373, score-0.466]
85 ,d} is selected by the variable selection i method for the responses concerning blogger Bj . [sent-376, score-0.323]
86 For each blogger Bj , this can be viewed as an application of a Granger test on Bj against bloggers B1 , B2 , . [sent-377, score-0.299]
87 This induces a directed weighted graph over bloggers, which we call causal graph, where edge weights are derived from the underlying regression coefficients. [sent-381, score-0.362]
88 We refer to influence measures on causal graphs as GrangerRanks. [sent-382, score-0.227]
89 For example, GrangerPageRank refers to applying pagerank on the causal graph while GrangerOutDegree refers to computing out-degrees of nodes as a measure of causal influence. [sent-383, score-0.598]
90 Our model produces the causal graph shown in Figure 1 showing influence relationships amongst high energy physicists. [sent-398, score-0.283]
91 (a) Causal Graph (b) Hyperlink Graph Figure 2: Causal and hyperlink graphs for the lotus blog dataset. [sent-448, score-0.26]
92 Figure 2 shows the causal graph learnt by our models on the left, and the hyperlink graph on the right. [sent-456, score-0.427]
93 We notice that the causal graph is sparser than the hyperlink graph. [sent-457, score-0.371]
94 By identifying the most significant causal relationships between bloggers, our causal graphs allow clearer inspection of the authorities and also appear to better expose striking sub-community structures in this blog community. [sent-458, score-0.609]
95 We also computed the correlation between PageRank and Outdegrees computed over our causal graph and the hyperlink graph (0. [sent-459, score-0.427]
96 4 Conclusion and Perspectives We have provided a framework for learning sparse multivariate regression models, where the sparsity structure is induced by groupings defined over both input and output variables. [sent-464, score-0.364]
97 We have shown that extended notions of Granger Causality for causal inference over high-dimensional time series can naturally be cast in this framework. [sent-465, score-0.264]
98 Grouped orthogonal matching pursuit for variable selection and prediction. [sent-520, score-0.32]
99 The pagerank citation ranking: Bringing order to the web. [sent-549, score-0.136]
100 Model selection and estimation in regression with grouped variables. [sent-586, score-0.162]
wordName wordTfidf (topN-words)
[('mgomp', 0.635), ('mgood', 0.234), ('causal', 0.227), ('blogger', 0.167), ('bloggers', 0.132), ('vec', 0.126), ('os', 0.12), ('multivariate', 0.119), ('omp', 0.117), ('ggood', 0.117), ('xa', 0.116), ('pursuit', 0.116), ('ok', 0.108), ('granger', 0.103), ('univariate', 0.096), ('output', 0.095), ('bt', 0.094), ('blog', 0.088), ('hyperlink', 0.088), ('pagerank', 0.088), ('group', 0.088), ('groups', 0.086), ('causality', 0.085), ('lotus', 0.084), ('uencers', 0.084), ('selection', 0.083), ('regression', 0.079), ('ir', 0.077), ('content', 0.067), ('arkady', 0.067), ('authorities', 0.067), ('gbad', 0.067), ('grangeroutdegree', 0.067), ('klebanov', 0.067), ('tseytlin', 0.067), ('formed', 0.065), ('uence', 0.062), ('regressions', 0.06), ('communities', 0.06), ('blogs', 0.059), ('igor', 0.059), ('block', 0.058), ('graph', 0.056), ('coef', 0.054), ('lc', 0.051), ('grangerpagerank', 0.05), ('kehagias', 0.05), ('sonnenschein', 0.05), ('twitter', 0.05), ('citation', 0.048), ('matching', 0.045), ('mink', 0.044), ('ols', 0.044), ('xis', 0.044), ('variable', 0.044), ('relatedness', 0.044), ('variables', 0.042), ('opinion', 0.041), ('sparsity', 0.04), ('matrix', 0.039), ('ranking', 0.039), ('ibm', 0.038), ('laptop', 0.038), ('tr', 0.037), ('series', 0.037), ('precision', 0.036), ('social', 0.036), ('douglas', 0.034), ('media', 0.034), ('parallel', 0.034), ('forums', 0.033), ('kogan', 0.033), ('melville', 0.033), ('pertinent', 0.033), ('swirszcz', 0.033), ('xtr', 0.033), ('bi', 0.033), ('namely', 0.033), ('inferring', 0.032), ('bj', 0.032), ('community', 0.032), ('topic', 0.032), ('orthogonal', 0.032), ('input', 0.031), ('notion', 0.031), ('grouping', 0.03), ('gi', 0.03), ('responses', 0.029), ('identity', 0.029), ('desiderata', 0.029), ('hyperlinked', 0.029), ('ax', 0.029), ('sites', 0.028), ('blocks', 0.027), ('bg', 0.027), ('xq', 0.027), ('posts', 0.027), ('authority', 0.027), ('lozano', 0.027), ('jacob', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
2 0.20318606 46 nips-2010-Causal discovery in multiple models from different experiments
Author: Tom Claassen, Tom Heskes
Abstract: A long-standing open research problem is how to use information from different experiments, including background knowledge, to infer causal relations. Recent developments have shown ways to use multiple data sets, provided they originate from identical experiments. We present the MCI-algorithm as the first method that can infer provably valid causal relations in the large sample limit from different experiments. It is fast, reliable and produces very clear and easily interpretable output. It is based on a result that shows that constraint-based causal discovery is decomposable into a candidate pair identification and subsequent elimination step that can be applied separately from different models. We test the algorithm on a variety of synthetic input model sets to assess its behavior and the quality of the output. The method shows promising signs that it can be adapted to suit causal discovery in real-world application areas as well, including large databases. 1
3 0.18816119 218 nips-2010-Probabilistic latent variable models for distinguishing between cause and effect
Author: Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M. Mooij, Bernhard Schölkopf
Abstract: We propose a novel method for inferring whether X causes Y or vice versa from joint observations of X and Y . The basic idea is to model the observed data using probabilistic latent variable models, which incorporate the effects of unobserved noise. To this end, we consider the hypothetical effect variable to be a function of the hypothetical cause variable and an independent noise term (not necessarily additive). An important novel aspect of our work is that we do not restrict the model class, but instead put general non-parametric priors on this function and on the distribution of the cause. The causal direction can then be inferred by using standard Bayesian model selection. We evaluate our approach on synthetic data and real-world data and report encouraging results. 1
4 0.096166223 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
5 0.077825457 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
Author: Sofia Mosci, Silvia Villa, Alessandro Verri, Lorenzo Rosasco
Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations. 1
6 0.075844176 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
7 0.074492514 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
8 0.073704235 217 nips-2010-Probabilistic Multi-Task Feature Selection
9 0.071800962 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
10 0.070371412 258 nips-2010-Structured sparsity-inducing norms through submodular functions
11 0.067335412 231 nips-2010-Robust PCA via Outlier Pursuit
12 0.065750912 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
13 0.065530419 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
14 0.063988924 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
15 0.060248364 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
16 0.059709746 181 nips-2010-Network Flow Algorithms for Structured Sparsity
17 0.055022355 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
18 0.054651566 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
19 0.054421641 162 nips-2010-Link Discovery using Graph Feature Tracking
20 0.054073215 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
topicId topicWeight
[(0, 0.166), (1, 0.049), (2, 0.033), (3, 0.092), (4, -0.033), (5, -0.115), (6, 0.008), (7, 0.06), (8, -0.111), (9, -0.084), (10, -0.104), (11, 0.161), (12, -0.069), (13, -0.091), (14, 0.065), (15, 0.24), (16, -0.03), (17, -0.048), (18, -0.059), (19, -0.025), (20, 0.081), (21, -0.038), (22, 0.003), (23, 0.008), (24, -0.025), (25, -0.051), (26, 0.026), (27, 0.002), (28, 0.016), (29, 0.011), (30, -0.095), (31, 0.024), (32, -0.023), (33, 0.068), (34, 0.009), (35, 0.063), (36, -0.012), (37, -0.078), (38, -0.087), (39, -0.012), (40, -0.044), (41, -0.002), (42, 0.007), (43, 0.019), (44, -0.065), (45, -0.053), (46, 0.004), (47, -0.031), (48, 0.04), (49, -0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.90112084 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
2 0.79971069 46 nips-2010-Causal discovery in multiple models from different experiments
Author: Tom Claassen, Tom Heskes
Abstract: A long-standing open research problem is how to use information from different experiments, including background knowledge, to infer causal relations. Recent developments have shown ways to use multiple data sets, provided they originate from identical experiments. We present the MCI-algorithm as the first method that can infer provably valid causal relations in the large sample limit from different experiments. It is fast, reliable and produces very clear and easily interpretable output. It is based on a result that shows that constraint-based causal discovery is decomposable into a candidate pair identification and subsequent elimination step that can be applied separately from different models. We test the algorithm on a variety of synthetic input model sets to assess its behavior and the quality of the output. The method shows promising signs that it can be adapted to suit causal discovery in real-world application areas as well, including large databases. 1
3 0.79135203 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
4 0.75705737 218 nips-2010-Probabilistic latent variable models for distinguishing between cause and effect
Author: Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M. Mooij, Bernhard Schölkopf
Abstract: We propose a novel method for inferring whether X causes Y or vice versa from joint observations of X and Y . The basic idea is to model the observed data using probabilistic latent variable models, which incorporate the effects of unobserved noise. To this end, we consider the hypothetical effect variable to be a function of the hypothetical cause variable and an independent noise term (not necessarily additive). An important novel aspect of our work is that we do not restrict the model class, but instead put general non-parametric priors on this function and on the distribution of the cause. The causal direction can then be inferred by using standard Bayesian model selection. We evaluate our approach on synthetic data and real-world data and report encouraging results. 1
5 0.42097428 45 nips-2010-CUR from a Sparse Optimization Viewpoint
Author: Jacob Bien, Ya Xu, Michael W. Mahoney
Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.
6 0.40648043 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
7 0.38591787 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
8 0.38384461 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
9 0.37523767 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
10 0.37399557 231 nips-2010-Robust PCA via Outlier Pursuit
11 0.3681595 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
12 0.36410898 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
13 0.36329406 217 nips-2010-Probabilistic Multi-Task Feature Selection
14 0.36180821 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
15 0.33688217 172 nips-2010-Multi-Stage Dantzig Selector
16 0.33525175 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
17 0.33397499 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
18 0.33005792 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
19 0.32758775 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
20 0.32340237 265 nips-2010-The LASSO risk: asymptotic results and real world examples
topicId topicWeight
[(13, 0.059), (17, 0.022), (26, 0.012), (27, 0.058), (30, 0.052), (35, 0.03), (45, 0.163), (50, 0.05), (52, 0.052), (60, 0.02), (77, 0.044), (78, 0.016), (87, 0.295), (90, 0.042)]
simIndex simValue paperId paperTitle
1 0.75625408 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection
Author: Javier R. Movellan, Paul L. Ruvolo
Abstract: Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, and human computer interaction. One popular approach to this problem is audio-visual synchrony detection [10, 21, 12]. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created that does not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e.g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance on the task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database. 1
same-paper 2 0.73281759 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
3 0.69204432 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
Author: Amin Sayedi, Morteza Zadimoghaddam, Avrim Blum
Abstract: We discuss an online learning framework in which the agent is allowed to say “I don’t know” as well as making incorrect predictions on given examples. We analyze the trade off between saying “I don’t know” and making mistakes. If the number of don’t-know predictions is required to be zero, the model reduces to the well-known mistake-bound model introduced by Littlestone [Lit88]. On the other hand, if no mistakes are allowed, the model reduces to KWIK framework introduced by Li et. al. [LLW08]. We propose a general, though inefficient, algorithm for general finite concept classes that minimizes the number of don’t-know predictions subject to a given bound on the number of allowed mistakes. We then present specific polynomial-time algorithms for the concept classes of monotone disjunctions and linear separators with a margin.
4 0.57695436 117 nips-2010-Identifying graph-structured activation patterns in networks
Author: James Sharpnack, Aarti Singh
Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1
5 0.57387912 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
6 0.56973195 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
7 0.56881267 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
8 0.56736892 265 nips-2010-The LASSO risk: asymptotic results and real world examples
9 0.56682551 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
10 0.56579101 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
11 0.56319708 148 nips-2010-Learning Networks of Stochastic Differential Equations
12 0.56186527 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
14 0.56073809 5 nips-2010-A Dirty Model for Multi-task Learning
15 0.55962276 258 nips-2010-Structured sparsity-inducing norms through submodular functions
16 0.55956852 63 nips-2010-Distributed Dual Averaging In Networks
17 0.55900103 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
18 0.55871028 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
19 0.55869389 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
20 0.55825692 280 nips-2010-Unsupervised Kernel Dimension Reduction