nips nips2012 nips2012-266 knowledge-graph by maker-knowledge-mining

266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Source: pdf

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

Abstract: A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. In this paper, we represent patient risk as a time series. In doing so, patient risk stratiﬁcation becomes a time-series classiﬁcation task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must ﬁrst be extracted. Thus, we begin by deﬁning and extracting approximate risk processes, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classiﬁcation with the goal of identifying high-risk patterns. We apply the classiﬁcation to the speciﬁc task of identifying patients at risk of testing positive for hospital acquired Clostridium difﬁcile. We achieve an area under the receiver operating characteristic curve of 0.79 on a held-out set of several hundred patients. Our two-stage approach to risk stratiﬁcation outperforms classiﬁers that consider only a patient’s current state (p<0.05). 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. [sent-6, score-0.538]

2 Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. [sent-7, score-0.583]

3 In this paper, we represent patient risk as a time series. [sent-8, score-0.95]

4 In doing so, patient risk stratiﬁcation becomes a time-series classiﬁcation task. [sent-9, score-0.928]

5 Thus, we begin by deﬁning and extracting approximate risk processes, the evolving approximate daily risk of a patient. [sent-11, score-0.956]

6 We apply the classiﬁcation to the speciﬁc task of identifying patients at risk of testing positive for hospital acquired Clostridium difﬁcile. [sent-13, score-0.906]

7 Our two-stage approach to risk stratiﬁcation outperforms classiﬁers that consider only a patient’s current state (p<0. [sent-16, score-0.457]

8 We consider a novel application of time-series analysis, patient risk. [sent-24, score-0.522]

9 Patient risk has an inherent temporal aspect; it evolves over time as it is inﬂuenced by intrinsic and extrinsic factors. [sent-25, score-0.464]

10 We hypothesize that, if one could measure risk over time, one could learn patterns of risk that are more likely to lead to adverse outcomes. [sent-27, score-0.882]

11 In this work, we frame the problem of identifying hospitalized patients for high-risk outcomes as a time-series classiﬁcation task. [sent-28, score-0.438]

12 We propose and motivate the study of patient risk processes to model the evolution of risk over the course of a hospital admission. [sent-29, score-1.512]

13 Speciﬁcally, we consider the problem of using time-series data to estimate the risk of an inpatient becoming colonized with Clostridium difﬁcile (C. [sent-30, score-0.482]

14 diff is a bacterial infection most often acquired in hospitals or nursing homes. [sent-33, score-0.443]

15 ) Despite the fact that many of the risk factors are well known (e. [sent-35, score-0.406]

16 diff rates for hospitalized patients aged ≥ 65 years increased by 200% [4]. [sent-41, score-0.719]

17 diff infection, and thus are not useful for predicting whether a patient will become infected. [sent-45, score-0.873]

18 In contrast, risk stratiﬁcation models aim to identify patients at high risk of becoming infected. [sent-46, score-1.148]

19 The use of these models could lead to a better understanding of the risk factors involved and ultimately provide information about how to reduce the incidence of C. [sent-47, score-0.424]

20 Reported results in the medical literature for the problem of risk stratiﬁcation for C. [sent-51, score-0.406]

21 We consider patients with at least a 7-day hospital admission who do not test positive for C. [sent-64, score-0.773]

22 This group of patients is already at an elevated risk for acquiring C. [sent-66, score-0.78]

23 diff because of the duration of the hospital stay. [sent-67, score-0.436]

24 To the best of our knowledge, representing and studying the risk of acquiring C. [sent-69, score-0.439]

25 We propose a risk stratiﬁcation method that aims to identify patterns of risk that are more likely to lead to adverse outcomes. [sent-71, score-0.882]

26 In [11] we proposed a method for extracting patient risk processes. [sent-72, score-0.956]

27 Once patient risk processes are extracted, the problem of risk stratiﬁcation becomes that of time-series classiﬁcation. [sent-73, score-1.377]

28 diff risk prediction is difﬁcult because of the differences in the studies mentioned above. [sent-76, score-0.744]

29 Thus, to measure the added value of considering the temporal dimension, we implemented the standard approach as represented in the related literature of classifying patients based on their current or average state and applied it to our data set. [sent-77, score-0.43]

30 diff during the current admission, we remove patients who tested positive for C. [sent-82, score-0.709]

31 diff in the 60 days preceding or, if negative, following the current admission [3]. [sent-83, score-0.746]

32 Positive cases are those patients who test positive on or after 7 days in the hospital. [sent-86, score-0.53]

33 We deﬁne the start of the risk period of a patient as the time of admission and deﬁne the end of the risk period, according to the following rule: if the patient tests positive, the ﬁrst positive test marks the end of the risk period, otherwise the patient is considered at risk until discharge. [sent-88, score-3.606]

34 3 Methods Patient risk is not a directly measurable time series. [sent-92, score-0.428]

35 Thus, we propose a two-stage approach to risk stratiﬁcation. [sent-93, score-0.406]

36 We ﬁrst extract approximate risk processes and then apply time-series classiﬁcation techniques to those processes. [sent-94, score-0.468]

37 1 Extracting Patient Risk Processes We extract approximate patient risk processes, i. [sent-97, score-0.947]

38 , a risk time series for each admission, by independently calculating the daily risk of a patient and then concatenating these predictions. [sent-99, score-1.507]

39 The remaining features are collected over the course of the admission and may change on a daily basis e. [sent-106, score-0.422]

40 We employ a support vector machine (SVM) to produce daily risk scores. [sent-109, score-0.522]

41 Each day of an admission is associated with its own feature vector. [sent-110, score-0.435]

42 We only know whether or not a patient eventually tests positive for C. [sent-113, score-0.599]

43 Thus we assign each day of an admission in which the patient eventually tests positive as positive, even though the patient may not have actually been at high risk on each of those days. [sent-115, score-1.927]

44 Since we do not expect a patient’s risk to remain constant during an entire admission, there is noise in the training labels. [sent-117, score-0.406]

45 We take the concatenated continuous outputs of the SVM for a hospital admission as a representation of the approximate risk process. [sent-124, score-0.786]

46 We give some examples of these approximate risk processes for both case and non-case patients in Figure 1. [sent-125, score-0.765]

47 5 5 10 15 20 25 Time (days) 5 10 15 20 Time (days) 25 30 35 40 Figure 1: Approximate daily risk represented as a time series results in a risk process for each patient. [sent-133, score-0.985]

48 One could risk stratify patients based solely on their current state, i. [sent-134, score-0.772]

49 , use the daily risk value from the risk process to classify patients as either high risk or low risk on that day. [sent-136, score-2.084]

50 We tested this intuition by classifying patients based on the average of their risk process. [sent-142, score-0.749]

51 2 Classifying Patient Risk Processes Given the risk processes of each patient, the risk stratiﬁcation task becomes a time-series classiﬁcation task. [sent-149, score-0.855]

52 , periods of time where the patient is consistently at high or low risk. [sent-170, score-0.544]

53 Finally, Features 14-17 summarize information regarding global maxima and minima in the approximate risk process. [sent-171, score-0.406]

54 Given these feature deﬁnitions, we map each patient admission risk process to a ﬁxed-length feature vector. [sent-172, score-1.262]

55 , when the maximum risk occurs relative to the time of prediction. [sent-175, score-0.428]

56 for each day and n is the number of days, predict daily risk xi based on the observ. [sent-185, score-0.682]

57 Figure 2: A two-step approach to risk stratiﬁcation where predeﬁned features are extracted from the time-series data. [sent-190, score-0.469]

58 3 Classiﬁcation using Hidden Markov Models We can make observations about a patient on a daily basis, but we cannot directly measure whether or not a patient is at high risk. [sent-216, score-1.16]

59 Classiﬁcation via Likelihood We hypothesize that there may exist patterns of risk over time that are more likely to lead to a positive test result. [sent-229, score-0.541]

60 Classiﬁcation via Posterior State Probabilities As we saw in Figure 1, the SVM output for a patient may ﬂuctuate greatly from day to day. [sent-235, score-0.658]

61 While large ﬂuctuations in risk are not impossible, they are not common. [sent-236, score-0.406]

62 Recall that in our initial calculation while the variables from time of admission are included in the prediction, the previous day’s risk is not. [sent-237, score-0.692]

63 A key decision was to use a left-to-right model where, once a patient reaches a “high-risk” state they remain there. [sent-243, score-0.548]

64 Figure 3 shows two examples of risk processes and their associated posterior state probabilities p(xt = s2 |y1 , . [sent-253, score-0.496]

65 diff on day 24 Figure 3: Given all of the observations from y1 , . [sent-269, score-0.456]

66 We classify each patient according to the probability of being in a high-risk state on the most recent day i. [sent-277, score-0.712]

67 4 Experiments & Results This section describes a set of experiments used to compare several methods for predicting a patient’s risk of acquiring C. [sent-283, score-0.47]

68 1 Experimental Setup In order to reduce the possibility of confusing the risk of becoming colonized with C. [sent-287, score-0.454]

69 diff with the existence of a current infection, for patients from the positive class we consider only data collected up to two days before a positive test result. [sent-288, score-0.923]

70 For patients who never test positive, researchers typically use the discharge day as the index event [3]. [sent-290, score-0.5]

71 However, this can lead to deceptively good results because patients nearing discharge are typically healthier than patients not nearing discharge. [sent-291, score-0.725]

72 We consider a minimum of 5 days for a negative patient since 5 days is the minimum amount of data we have for any positive patient (e. [sent-293, score-1.393]

73 Additionally, we remove outliers, those patients with admissions longer than 60 days. [sent-298, score-0.399]

74 The second classiﬁer RP+Average is an initial improvement on this approach, and classiﬁes patients based on the average value of their risk process. [sent-308, score-0.722]

75 5days classiﬁes patients using a non-linear SVM based on the Euclidean distance between the most recent 6 Table 2: Predicting a positive test result two days in advance using different classiﬁers. [sent-311, score-0.555]

76 Current State represents the traditional approach to risk stratiﬁcation, and is the only classiﬁer that is not based on patient Risk Processes (RP). [sent-312, score-0.928]

77 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Feature 1 FPR (1−Specificity) Figure 4: Results of predicting a patient’s risk of testing positive for C. [sent-382, score-0.485]

78 uses the entire risk process by interpolating between points. [sent-388, score-0.406]

79 This classiﬁer is based on a linear combination of statistics (listed in Table 1) computed from the patient risk processes. [sent-403, score-0.928]

80 The most important features are the length of the time series (Feature 1), the fraction of the time for which the patient is at positive risk (Feature 9), and the maximum risk attained (Feature 14). [sent-408, score-1.523]

81 The only two features with signiﬁcantly negative weights are Feature 10 and Feature 13, the overall fraction of time a patient has a negative risk, and the longest consecutive period of time that a patient has negative risk. [sent-409, score-1.29]

82 To further convey the ability of the classiﬁer to risk stratify patients, we split the test patients into quintiles (as is often done in clinical studies) based on the continuous output of the classiﬁer. [sent-412, score-0.856]

83 For each quintile we calculated the probability of a positive test result, based on those patients who eventually test positive for C. [sent-414, score-0.544]

84 The difference between the 1st and 5th quintiles is striking; relative to the 1st quintile, patients in the 5th quintile are at more than a 25-fold greater risk. [sent-417, score-0.415]

85 5 Figure 7: Test patients with RP+Features predictions in the 5th quintile are more than 25 times more likely to test positive for C. [sent-425, score-0.484]

86 Discussion & Conclusion To the best of our knowledge, we are the ﬁrst to consider risk of acquiring an infection as a time series. [sent-427, score-0.584]

87 We use a two-stage process, ﬁrst extracting approximate risk processes and then using the risk process as an input to a classiﬁer. [sent-428, score-0.883]

88 The majority of the methods based on time-series classiﬁcation performed as well if not better than the previous approach of classifying patients simply based on the average of their risk process. [sent-430, score-0.749]

89 Still, we are encouraged by these results, which suggest that posing the risk stratiﬁcation problem as a time-series classiﬁcation task can provide more accurate models. [sent-432, score-0.406]

90 Still, based on the mean performance, all classiﬁers that incorporate patient risk processes outperform the Current State classiﬁer, and the majority of those classiﬁers perform as well or better than the RP+Average. [sent-434, score-0.971]

91 5days classiﬁes patients based on a similarity metric using only the most recent 5 days of the patient risk processes. [sent-438, score-1.401]

92 Its relatively poor performance suggests that a patient’s risk may depend on the entire risk process. [sent-439, score-0.812]

93 For this reason, the medical literature on risk stratiﬁcation typically focuses on a combination of the AUC and the kind of odds ratios derivable from the data in Figure 7. [sent-448, score-0.425]

94 However, for the daunting task of risk stratifying patients already at an elevated risk for C. [sent-451, score-1.153]

95 Clostridium difﬁcile - associated disease in a setting of endemicity: Identiﬁcation of novel risk factors. [sent-479, score-0.427]

96 Clinical prediction rules to optimize cytotoxin testing for clostridium difﬁcile in hospitalized patients with diarrhea. [sent-490, score-0.538]

97 Waterlow score to predict patietns at risk of developing clostridium difﬁcile-associated disease. [sent-497, score-0.545]

98 Development and validation of a clostridium difﬁcile infection risk prediction model. [sent-511, score-0.668]

99 A clinical risk index for clostridium difﬁcile infection in hospitalized patients receiving broadspectrum antibiotics. [sent-525, score-1.122]

100 Predicting clostridium difﬁcile toxin in hospitalized patients with antibiotic-associated diarrhea. [sent-538, score-0.538]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('patient', 0.522), ('risk', 0.406), ('diff', 0.32), ('patients', 0.316), ('admission', 0.264), ('rp', 0.157), ('clostridium', 0.139), ('days', 0.137), ('day', 0.136), ('strati', 0.136), ('infection', 0.123), ('hospital', 0.116), ('daily', 0.116), ('cile', 0.111), ('classi', 0.084), ('admissions', 0.083), ('hospitalized', 0.083), ('auc', 0.079), ('quintile', 0.074), ('hmmlikelihood', 0.056), ('clinical', 0.055), ('er', 0.05), ('dtw', 0.049), ('warping', 0.048), ('positive', 0.048), ('svm', 0.044), ('processes', 0.043), ('features', 0.042), ('ers', 0.04), ('hmms', 0.039), ('temporal', 0.036), ('longest', 0.035), ('feature', 0.035), ('cation', 0.035), ('series', 0.035), ('horvitz', 0.034), ('adverse', 0.034), ('acquiring', 0.033), ('predicting', 0.031), ('hmm', 0.03), ('tests', 0.029), ('test', 0.029), ('classify', 0.028), ('extracting', 0.028), ('colonized', 0.028), ('discharged', 0.028), ('dubberke', 0.028), ('epidemiol', 0.028), ('guttag', 0.028), ('hmmposterior', 0.028), ('hosp', 0.028), ('inpatient', 0.028), ('nearing', 0.028), ('reske', 0.028), ('wiens', 0.028), ('negative', 0.027), ('classifying', 0.027), ('state', 0.026), ('xn', 0.025), ('current', 0.025), ('distance', 0.025), ('stratify', 0.025), ('handwriting', 0.025), ('infect', 0.025), ('quintiles', 0.025), ('elevated', 0.025), ('rami', 0.025), ('xi', 0.024), ('period', 0.024), ('time', 0.022), ('medicine', 0.022), ('imbalance', 0.021), ('symptoms', 0.021), ('probabilities', 0.021), ('extracted', 0.021), ('reported', 0.021), ('disease', 0.021), ('similarity', 0.02), ('fraction', 0.02), ('ci', 0.02), ('dif', 0.02), ('becoming', 0.02), ('identifying', 0.02), ('discharge', 0.019), ('evolution', 0.019), ('outcomes', 0.019), ('hidden', 0.019), ('yn', 0.019), ('extract', 0.019), ('xt', 0.019), ('guidelines', 0.019), ('odds', 0.019), ('lead', 0.018), ('patterns', 0.018), ('interpolate', 0.018), ('differences', 0.018), ('visit', 0.017), ('receiver', 0.017), ('predictions', 0.017), ('extraction', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

2 0.22731018 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

Abstract: Accurate and detailed models of neurodegenerative disease progression are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer’s disease Probabilistic Cascades) model, a generative model linking latent Alzheimer’s progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a ﬁxed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efﬁcient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer’s patients from the Alzheimer’s Disease Neuroimaging Initiative. 1

3 0.17915261 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

Author: Uri Maoz, Shengxuan Ye, Ian Ross, Adam Mamelak, Christof Koch

Abstract: The ability to predict action content from neural signals in real time before the action occurs has been long sought in the neuroscientiﬁc study of decision-making, agency and volition. On-line real-time (ORT) prediction is important for understanding the relation between neural correlates of decision-making and conscious, voluntary action as well as for brain-machine interfaces. Here, epilepsy patients, implanted with intracranial depth microelectrodes or subdural grid electrodes for clinical purposes, participated in a “matching-pennies” game against an opponent. In each trial, subjects were given a 5 s countdown, after which they had to raise their left or right hand immediately as the “go” signal appeared on a computer screen. They won a ﬁxed amount of money if they raised a different hand than their opponent and lost that amount otherwise. The question we here studied was the extent to which neural precursors of the subjects’ decisions can be detected in intracranial local ﬁeld potentials (LFP) prior to the onset of the action. We found that combined low-frequency (0.1–5 Hz) LFP signals from 10 electrodes were predictive of the intended left-/right-hand movements before the onset of the go signal. Our ORT system predicted which hand the patient would raise 0.5 s before the go signal with 68±3% accuracy in two patients. Based on these results, we constructed an ORT system that tracked up to 30 electrodes simultaneously, and tested it on retrospective data from 7 patients. On average, we could predict the correct hand choice in 83% of the trials, which rose to 92% if we let the system drop 3/10 of the trials on which it was less conﬁdent. Our system demonstrates— for the ﬁrst time—the feasibility of accurately predicting a binary action on single trials in real time for patients with intracranial recordings, well before the action occurs. 1 1

4 0.12365769 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difﬁcult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we deﬁne two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

5 0.098645031 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

Author: Teodor M. Moldovan, Pieter Abbeel

Abstract: The expected return is a widely used objective in decision making under uncertainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections to previously proposed objectives for risk-aware planing: minmax, exponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efﬁcient algorithm for optimizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale. 1

6 0.081922412 200 nips-2012-Local Supervised Learning through Space Partitioning

7 0.081508353 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes

8 0.079279862 32 nips-2012-Active Comparison of Prediction Models

9 0.075625323 227 nips-2012-Multiclass Learning with Simplex Coding

10 0.070427418 197 nips-2012-Learning with Recursive Perceptual Representations

11 0.068616465 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

12 0.066442706 271 nips-2012-Pointwise Tracking the Optimal Regression Function

13 0.05957086 212 nips-2012-Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

14 0.058613509 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

15 0.057637151 327 nips-2012-Structured Learning of Gaussian Graphical Models

16 0.057504922 80 nips-2012-Confusion-Based Online Learning and a Passive-Aggressive Scheme

17 0.05473987 247 nips-2012-Nonparametric Reduced Rank Regression

18 0.050825421 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

19 0.049401432 348 nips-2012-Tractable Objectives for Robust Policy Optimization

20 0.047538631 188 nips-2012-Learning from Distributions via Support Measure Machines

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.131), (1, -0.021), (2, -0.012), (3, 0.003), (4, 0.052), (5, -0.018), (6, 0.011), (7, 0.117), (8, -0.044), (9, -0.026), (10, 0.02), (11, 0.045), (12, -0.008), (13, -0.058), (14, -0.027), (15, -0.112), (16, -0.05), (17, 0.031), (18, -0.024), (19, -0.029), (20, 0.01), (21, 0.035), (22, -0.14), (23, -0.038), (24, 0.134), (25, -0.23), (26, 0.084), (27, 0.082), (28, -0.069), (29, 0.101), (30, 0.003), (31, 0.012), (32, -0.21), (33, 0.109), (34, -0.033), (35, 0.027), (36, 0.019), (37, -0.003), (38, 0.024), (39, -0.05), (40, -0.082), (41, 0.136), (42, -0.151), (43, 0.08), (44, 0.115), (45, -0.013), (46, -0.113), (47, 0.007), (48, 0.096), (49, -0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92531019 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

2 0.68547845 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

3 0.54867607 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

Author: Uri Maoz, Shengxuan Ye, Ian Ross, Adam Mamelak, Christof Koch

4 0.45511031 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

Author: Teodor M. Moldovan, Pieter Abbeel

5 0.44691986 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difﬁcult to interpret and nontrivial to extend to consider interactions among sites. These developments and difﬁculties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efﬁcient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

6 0.43536481 212 nips-2012-Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

7 0.42335841 271 nips-2012-Pointwise Tracking the Optimal Regression Function

8 0.42320701 32 nips-2012-Active Comparison of Prediction Models

9 0.38397768 295 nips-2012-Risk-Aversion in Multi-armed Bandits

10 0.38097897 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

11 0.37126943 297 nips-2012-Robustness and risk-sensitivity in Markov decision processes

12 0.35821474 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

13 0.35753018 46 nips-2012-Assessing Blinding in Clinical Trials

14 0.35734212 200 nips-2012-Local Supervised Learning through Space Partitioning

15 0.34778425 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

16 0.34535888 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

17 0.34423709 72 nips-2012-Cocktail Party Processing via Structured Prediction

18 0.33448696 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

19 0.33402231 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

20 0.31053516 222 nips-2012-Multi-Task Averaging

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.031), (1, 0.38), (17, 0.024), (21, 0.021), (38, 0.065), (42, 0.027), (54, 0.022), (55, 0.03), (74, 0.047), (76, 0.124), (80, 0.087), (92, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69009703 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

2 0.67236561 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be deﬁned, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1

3 0.57055217 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identiﬁcation of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efﬁcient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

4 0.53632849 188 nips-2012-Learning from Distributions via Support Measure Machines

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a ﬂexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

5 0.52144253 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: ﬁxed budget and ﬁxed conﬁdence. We propose a unifying approach that leads to a meta-algorithm called uniﬁed gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing ﬁxed budget and ﬁxed conﬁdence algorithms. 1

6 0.49257252 34 nips-2012-Active Learning of Multi-Index Function Models

7 0.48525375 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

8 0.43087384 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

9 0.43085346 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

10 0.42975628 197 nips-2012-Learning with Recursive Perceptual Representations

11 0.42698279 279 nips-2012-Projection Retrieval for Classification

12 0.424532 168 nips-2012-Kernel Latent SVM for Visual Recognition

13 0.42384031 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

14 0.41964298 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

15 0.41953138 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

16 0.41911265 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

17 0.41875881 193 nips-2012-Learning to Align from Scratch

18 0.4180772 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

19 0.41793048 198 nips-2012-Learning with Target Prior

20 0.41774508 200 nips-2012-Local Supervised Learning through Space Partitioning