nips nips2002 nips2002-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kenneth J. Malmberg, René Zeelenberg, Richard M. Shiffrin
Abstract: The benz.odiaze:pine '~1idazolam causes dense,but temporary ~ anterograde amnesia, similar to that produced by- hippocampal damage~Does the action of M'idazola:m on the hippocanlpus cause less storage, or less accurate storage, .of information in episodic. long-term menlory?- \rVe used a sinlple variant of theREJv1. JD.odel [18] to fit data collected. by IIirsbnla.n~Fisher, .IIenthorn,Arndt} and Passa.nnante [9] on the effects of Midazola.m, study time~ and normative \vQrd.. frequenc:y on both yes-no and remember-k.novv recognition m.emory. That a: simple strength. 'model fit well \\tas cont.rary to the expectations of 'flirshman et aLMore important,within the Bayesian based R.EM modeling frame\vork, the data were consistentw'ith the view that Midazolam causes less accurate storage~ rather than less storage, of infornlation in episodic mcm.ory..
Reference: text
sentIndex sentText sentNum sentScore
1 odiaze:pine '~1idazolam causes dense,but temporary ~ anterograde amnesia, similar to that produced by- hippocampal damage~Does the action of M'idazola:m on the hippocanlpus cause less storage, or less accurate storage, . [sent-11, score-0.215]
2 rary to the expectations of 'flirshman et aLMore important,within the Bayesian based R. [sent-28, score-0.043]
3 EM modeling frame\vork, the data were consistentw'ith the view that Midazolam causes less accurate storage~ rather than less storage, of infornlation in episodic mcm. [sent-29, score-0.261]
4 1 Introduction 'Danlage to the hippocampus (and nearby regions), often caused by lesiclns leaves normal cognitive function intact in the short term. [sent-32, score-0.049]
5 ming deficit: Does damage cause less storage, or less accurate storagc 1 of information in long-term episodic menlQry£! [sent-37, score-0.191]
6 2 Empirical findings The participants in Hirshman et at [9] studied lists of \~ords that ·varied in nomlative. [sent-40, score-0.2]
7 -frequency) and the amount of time allocated for study (either not studied, or studied for 500, 1. [sent-44, score-0.154]
8 200, or 2500 ms per ·word)+ These variables are known to have a robust effect on rec. [sent-45, score-0.071]
9 ognition memory in nornlal populations; Lo\v-frequency (LF) \vords are better recognized tllan high·· frequency (FIT) \"rOrd5~ an. [sent-46, score-0.079]
10 ding 'oldY to studied words (temJed hit rate~ or FfR) is higher forL·F \:vords than forHF '\¥ords, and t11e probability of responding 'old· to unstu. [sent-53, score-0.127]
11 e or l'vfidazolatn a11d then studied a list of ·words. [sent-64, score-0.076]
12 fter a delay of about an hour they ,vere sho\vn studied words eoldt ) and unstudied words Cnew1)'1 a. [sent-66, score-0.178]
13 n the saline condition, given in the left panel, replicate tIle standard effects in the literature: In the figure; the points labeled with zero study time give FAR. [sent-77, score-0.509]
14 Thus ,ve see that the saline group exhibits better performance forL·F words al1d a rnirror effect: ForLF words~ FA·Rs are IO\,l. [sent-80, score-0.486]
15 More critically, the pattern of results differs from that for the sal ine group: The mirror effect was lostL,F \vords produced both. [sent-83, score-0.138]
16 odeL ZerQ U1S study time refers to 'new~ items so the data gives the false-alarnl rate (FAR). [sent-135, score-0.078]
17 Data sh(nvn for non-zero study times give hit rates (lIR). [sent-136, score-0.078]
18 Only the REM parameter & varies bchvecn the saline andm. [sent-137, score-0.393]
19 The participants also indicated 'Athether their "old'" judgrnents \vere made on the basis of '~rememberingH the study event or on the basis of "kno\¥ing" the v. [sent-154, score-0.124]
20 tlley could not explicitly rernenlber the study event [5]. [sent-157, score-0.078]
21 Of greatest interest (or present purposes, ~~knowr~" and "rernelnber)' responses \vere differently affected by the \vord frequen. [sent-160, score-0.044]
22 e conditional probability of a t'kno\\{'~ judgnlent (given an t'o]d:l~ response) was consistently higber tb. [sent-164, score-0.066]
23 lVl"oreover, these probabilities ·were hardly affected by study timei A different pattern \vas obtained in the Saline condition. [sent-166, score-0.122]
24 Final1y~ tor LF \vords, the conditional probability of a "'kn. [sent-171, score-0.121]
25 o\v" Judgment vvas higher than that of a Hrernenlber~' judgrrlent tor nonstudjed foj. [sent-172, score-0.165]
26 ls~ but tor studied targets the conditional probability of a. [sent-173, score-0.197]
27 d rerrlenlberlk"u\¥ re·sults were interpreted by Hirshman et aL [91 to require a dual process account; in particular~ tlle authors argued against Hnlenlory strel1gtll~' accounts [4 t 6~ 11]. [sent-177, score-0.196]
28 ow data froluHirshman et aL and predictions ora REM fllodeL The paramete:r values are those listed in t. [sent-266, score-0.043]
29 lanl group, (:ritRlK;;;' 1~30~ 3 A REM model for recognition and remember/know judgments Aconlmonway to conceive of recognition. [sent-277, score-0.11]
30 is probed \vith the test item, and the recognition decision is based on a. [sent-279, score-0.066]
31 ·of A subclass this type of model accounts for the vvord-frequency mirror effect by assuming that there exist four underlying d. [sent-287, score-0.138]
32 are arranged along a familiarity scale in the follo\ving n1annct: p(L·F-nc\v) ~:::: jl(HF-nc\\r) <~ p(HP-old) < p(LF~old). [sent-292, score-0.095]
33 model of this type can predict the recognition fmdings of IIirshn1a. [sent-298, score-0.066]
34 EM 1110del of the \~lord-frequency effect described by Shiffrin and Steyvers [13, 18, 19] is a member of this class of models, as \ve describe next. [sent-307, score-0.071]
35 8] assumes that memory traces consist of vectors Y, of length ~, of nonnegativ·e integer feature values Zero represents no infomlation about a feature ()thenvise the values for a given feature· are assum. [sent-310, score-0.337]
36 probability distribution given as Equation 1: P(V = j) = (l_gy-lg, for j= 1 and higber~ Thus higber integer values represent feature values less likely to be encountered in the environment R. [sent-312, score-0.108]
37 cy'" assumption [13]: the lexicalJsemantic traces of lU\\ler frequency ·words are generated \vith a low·er value of g (Le. [sent-314, score-0.174]
38 These lexical/semantic traces represent general knovvledge (ekg~, the orthographic, pl1onological, senlantic, and contextual characteristics of a \vord) and bave very many non-zero feature values~ most of'\vl1icb. [sent-316, score-0.216]
39 Episodic traces represent the occurrence of stinluli in a certain environmental context; they are built of tlle same feature types as lexical/senlantic traces, but tend to be in,cOlnplete (bavemany zero values) and inaccurate (the values do not nec. [sent-319, score-0.369]
40 + v When a \vord is studied, an incomplete and error prone representation of the trace is stored in a. [sent-321, score-0.121]
41 The probability that a feature ',eVill be stored in the episodic inlage after! [sent-323, score-0.338]
42 time units of study is given as Equation 2: 1 - (1 - 11*)1, where ! [sent-324, score-0.078]
43 * is the probability of storing a feature in an arbitrary un-it of time~ The number of attempts, 1j, at storin. [sent-326, score-0.078]
44 tent featur-e for an itenl studied for j units of time is co. [sent-328, score-0.076]
45 "rhus, increased study time increases the storage of features, but the gain in the amount of information stored diminishes as the itctn is studied longer. [sent-347, score-0.419]
46 Features that arc not copied frotn the lexical/semantic trace arc represented by a valu. [sent-348, score-0.255]
47 If storage of a feature docs occur, the feature value is correctly copied from the ,vord~s lexical/semantic tI'aCC '\vith probability Q. [sent-350, score-0.345]
48 ~ the value is incorrectly copied and sall1plcd randolnly from the long-run base-rate gco111ctric distribution:, a distribution defin. [sent-353, score-0.057]
49 :§; At test, a probe made with context features only is assumed to activate the episodic traces~ Ij, of the ! [sent-356, score-0.266]
50 Then the content features of the probe cue are tnatched in parallel to the activated traces. [sent-358, score-0.162]
51 For each episodic trace, Ii, the system notes the values. [sent-360, score-0.191]
52 of features of Ii that rnatch the corresponding feature of the cue (njjm stands for the number of matching values in tl1e j-th image that have value i)} and the ntnnber of nlisulatcbing featq. [sent-361, score-0.248]
53 res (njq stands tor the number of mismatching values in the fhimage). [sent-362, score-0.121]
54 m (4) gel-g) ~ is the likelihood ratio for the fh itnage~ It can be thought of as a· runtchstrength bet\veen the retrieval cue and. [sent-374, score-0.206]
55 TI,e recognition decision is based on the odds1 <1>, giving the probability that the test item is old divided by the probability the test itetn is ne," [18]. [sent-377, score-0.221]
56 » J (5) 11 j=4 If the odds exceed a criterion~ then an Uoldj~ response is 1nade, The default criterion is '1. [sent-379, score-0.088]
57 \1atching features contribute evidence that an item is old (contribute factors to the product in ,Eq. [sent-383, score-0.305]
58 3 greater than 1~O) and n1ismatching features contribute evidence that an item is ne\\' (contribute factors less than l . [sent-384, score-0.23]
59 vlpredicts an effect of study time because storage of Olore non-zero features increases the number of matching target-trace features; this factor outweighs the general increase in variance produced by'" greater nunlbers of non-zero features in vectors. [sent-386, score-0.591]
60 'RENt predicts a L·F HR advantage because the matching ofthe more uncon1mon features associated 'W'"ith LF words produces greater evidence that the item is old than the matching of the more COOlmon features associated with H. [sent-387, score-0.477]
61 For foils~ however~ every teature match is due to chance; such matching occurs n10re frequently for HF tl1an LF \vords because HF features are ,nore common [12]. [sent-390, score-0.119]
62 TIlis factor outweighs the higher diagnosticity of matches tl1f theLF words, andHF vV'otds are predicted to have higher FARs than L·F '\vords~ an Much evidence points to the critical role of the hippocampal region in storing episodic memory' traces ['I, 14, l5, ] 6 20]. [sent-391, score-0.619]
63 1idazolam has been sho\vn to affect the storage, but not the retrieval of memory traces [22]. [sent-393, score-0.372]
64 EM that affect the storage offeatures in tnemory: 11* detennines the nuolber of features that get stored, ~nd £. [sent-395, score-0.279]
65 Ho\vever, Hirshulan et at '8 data constrain wl11ch of these possibil ities is viable. [sent-398, score-0.043]
66 l Let us assutne that MidazoJam only causes the hippocampal region to store fe\ver features, relative to the saline condition (i. [sent-399, score-0.638]
67 In REM~ this causes te\\>Ter terms in the product given by . [sent-402, score-0.07]
68 4~ and a lO\\>Tervalue for tlle result~ on the average. [sent-404, score-0.153]
69 Het1ce~ if Midazolam causes fe\ver features to be stored~ subjects should approach chance-le,\tel performance for both HF and . [sent-405, score-0.145]
70 ENf, the main effect of l\1idazolam on the functioning of the hippocampal region is not to reduce the n. [sent-423, score-0.212]
71 Alternatively let us assunle that Midazolam causes tIle hippocalTIpal region to store '~nQisier" episodic traces, as o. [sent-425, score-0.373]
72 pposed to traces wi th feV~ter :non-zero features~ instantiated in RE·Tvf by d. [sent-426, score-0.174]
73 urs bee-ause the H,F retrieval cues used to prope melTIOry have more comnlon features (on average) than the LF retrieval cues, a factor that cornes to dominate \vhen the true 'signar (mate-hing features in the target trace) begins to disintegrate into noise (due to l. [sent-442, score-0.388]
74 shows predictions of a REM nlodel incorporating the· assumption that only ~ \taries benveen the saline a. [sent-445, score-0.393]
75 For retrieval the same ~ value \vas used iri both the saline and Midazolanl conditions to calculate the likelihoods in E,q~ation 4 (an assumptioll consistent with retrieval tuned to the partiei. [sent-448, score-0.631]
76 ts the storage of traces and not their retrieval [17]. [sent-450, score-0.497]
77 the 1v1idazolarn and saline gro~ps, and therefore is 110t of consequence tor the present article \Vithin the RE,M framework; then; the main effect of Midazolan1 is to ca·use the hippocampal region to store more noisy episodic traces. [sent-457, score-0.951]
78 As described above~ an totd t decision is given when the familiarity (Le~ a,ctivation~ or in RE1vf tenns the odds) a. [sent-465, score-0.095]
79 \llords ,,,bose familiarity exceeds the higher renlenlherllo,O\v" criterion a. [sent-472, score-0.148]
80 re given the ·'renlenlber" response, and a "knowH response is givenw'hen the remember/kno\¥ criterion is 110t exceeded. [sent-473, score-0.053]
81 Figure 2 shows that this lnodel predicts the effects of MidazQlam and saline both qualitatively and qua,ntitatively·. [sent-474, score-0.431]
82 TIllS fit was obtained by' using slightly different renlenlber~know criteria in the saline and 'Midazolam conditions (1. [sent-475, score-0.498]
83 26 in the saline and Midazolam conditions, respectively), but aJl the qualitative effects are predicted correctly even\vhen the same criterion is adopted for remembetlknow. [sent-477, score-0.484]
84 1\1 fratnework this result suggests the maill effect of l\1idazolalu (possibly all tIle effect) is on ~ (accuracy of storage) rather than Otll1* (quantity of storage). [sent-507, score-0.115]
85 t is possible to conceive of a much more complex RE·M model that assurnes that the effect of IVIidazolatll is to reduce the aOlount of storage. [sent-510, score-0.115]
86 inst traces stored prior to tl1e experiment Such a modeL Inightpredict Hirshman et at "5 tin. [sent-523, score-0.278]
87 nule neurons and SOlne are pyramidal neurons~ The granule cells are associated ,vitb. [sent-545, score-0.044]
88 benzodiazepine~ and benzodiazepines inhibit the tiring of (]ABAergic interneurons in the hippocampus [3]. [sent-550, score-0.093]
89 Hence, if tv1idazolan) inhibits the tiring of those cells that regulate the orderly firing of the vac;t majority of hippocampal cel1s! [sent-551, score-0.151]
90 l then it is a reasonable to speculate that the result is a "noisier" episodic memory trace~ The a. [sent-552, score-0.27]
91 vt idazolaln causes noisier storage rather than less storage raises tb. [sent-554, score-0.516]
92 e silnilar effects caused by hippocampal lesions or other sorts of datnage (e. [sent-556, score-0.145]
93 stagc mode1 of memory trace formation: A role for HnoisyH brain stales. [sent-563, score-0.139]
94 I\ctivity of dentate granule cells during learning; difJerentiation of peri'orant path input. [sent-585, score-0.044]
95 A retrieval model for both recognition and rcc·alL . [sent-599, score-0.185]
96 (in press) :Mjdazolam amnesia and dual-process models of the ,Yord frcquc·ncy mirror effect. [sent-625, score-0.124]
97 & Henzler, A_ (1998) The role of decision processes in conscious memory, Psych. [sent-630, score-0.044]
98 (1997) I\1odeling the conscious c(ltrelates ofrecQgnitioll memory: Reflections on the 'Rctnctnbcr-Know paradigm. [sent-633, score-0.044]
99 (2002)_ List compos111on and the word-frequency effecL for recognition memory. [sent-639, score-0.066]
100 \Vhy there arc CoUtplCnlcntary learning systems in the hippocampus and ncocortex: Insights fronl the succe·gses and failures or connecti()uist n1{)d~ls of learning and memory. [sent-668, score-0.162]
wordName wordTfidf (topN-words)
[('saline', 0.393), ('lf', 0.347), ('midazolam', 0.306), ('storage', 0.204), ('episodic', 0.191), ('traces', 0.174), ('hf', 0.174), ('tlle', 0.153), ('vords', 0.153), ('rem', 0.133), ('hirshman', 0.131), ('tor', 0.121), ('retrieval', 0.119), ('shiffrin', 0.109), ('hippocampal', 0.107), ('fit', 0.105), ('familiarity', 0.095), ('kno', 0.087), ('sho', 0.087), ('tile', 0.087), ('ver', 0.087), ('vith', 0.087), ('cue', 0.087), ('item', 0.08), ('memory', 0.079), ('study', 0.078), ('studied', 0.076), ('vas', 0.076), ('features', 0.075), ('old', 0.075), ('effect', 0.071), ('causes', 0.07), ('arc', 0.069), ('hr', 0.069), ('mirror', 0.067), ('recognition', 0.066), ('bloomington', 0.066), ('higber', 0.066), ('indiana', 0.066), ('midazolanl', 0.066), ('vord', 0.066), ('stored', 0.061), ('trace', 0.06), ('oj', 0.058), ('copied', 0.057), ('amnesia', 0.057), ('lo', 0.053), ('criterion', 0.053), ('vn', 0.052), ('words', 0.051), ('hippocampus', 0.049), ('participants', 0.046), ('affected', 0.044), ('matching', 0.044), ('abaergic', 0.044), ('assunle', 0.044), ('benzodiazepine', 0.044), ('conceive', 0.044), ('conscious', 0.044), ('donaldson', 0.044), ('fars', 0.044), ('forl', 0.044), ('fratnework', 0.044), ('fronl', 0.044), ('granule', 0.044), ('hirshnlan', 0.044), ('hirshulan', 0.044), ('ho', 0.044), ('hrsandfars', 0.044), ('inlage', 0.044), ('judgrnent', 0.044), ('olore', 0.044), ('ords', 0.044), ('outweighs', 0.044), ('tills', 0.044), ('tiring', 0.044), ('tvl', 0.044), ('vere', 0.044), ('vever', 0.044), ('vhat', 0.044), ('vhen', 0.044), ('ving', 0.044), ('vithin', 0.044), ('vvas', 0.044), ('judgment', 0.044), ('et', 0.043), ('contribute', 0.042), ('group', 0.042), ('feature', 0.042), ('ct', 0.039), ('ter', 0.038), ('anterograde', 0.038), ('noisier', 0.038), ('effects', 0.038), ('storing', 0.036), ('findings', 0.035), ('odds', 0.035), ('store', 0.034), ('region', 0.034), ('evidence', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
Author: Kenneth J. Malmberg, René Zeelenberg, Richard M. Shiffrin
Abstract: The benz.odiaze:pine '~1idazolam causes dense,but temporary ~ anterograde amnesia, similar to that produced by- hippocampal damage~Does the action of M'idazola:m on the hippocanlpus cause less storage, or less accurate storage, .of information in episodic. long-term menlory?- \rVe used a sinlple variant of theREJv1. JD.odel [18] to fit data collected. by IIirsbnla.n~Fisher, .IIenthorn,Arndt} and Passa.nnante [9] on the effects of Midazola.m, study time~ and normative \vQrd.. frequenc:y on both yes-no and remember-k.novv recognition m.emory. That a: simple strength. 'model fit well \\tas cont.rary to the expectations of 'flirshman et aLMore important,within the Bayesian based R.EM modeling frame\vork, the data were consistentw'ith the view that Midazolam causes less accurate storage~ rather than less storage, of infornlation in episodic mcm.ory..
2 0.16096632 176 nips-2002-Replay, Repair and Consolidation
Author: Szabolcs Káli, Peter Dayan
Abstract: A standard view of memory consolidation is that episodes are stored temporarily in the hippocampus, and are transferred to the neocortex through replay. Various recent experimental challenges to the idea of transfer, particularly for human memory, are forcing its re-evaluation. However, although there is independent neurophysiological evidence for replay, short of transfer, there are few theoretical ideas for what it might be doing. We suggest and demonstrate two important computational roles associated with neocortical indices.
3 0.071471691 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
Author: Alexei Vinokourov, Nello Cristianini, John Shawe-Taylor
Abstract: The problem of learning a semantic representation of a text document from data is addressed, in the situation where a corpus of unlabeled paired documents is available, each pair being formed by a short English document and its French translation. This representation can then be used for any retrieval, categorization or clustering task, both in a standard and in a cross-lingual setting. By using kernel functions, in this case simple bag-of-words inner products, each part of the corpus is mapped to a high-dimensional space. The correlations between the two spaces are then learnt by using kernel Canonical Correlation Analysis. A set of directions is found in the first and in the second space that are maximally correlated. Since we assume the two representations are completely independent apart from the semantic content, any correlation between them should reflect some semantic similarity. Certain patterns of English words that relate to a specific meaning should correlate with certain patterns of French words corresponding to the same meaning, across the corpus. Using the semantic representation obtained in this way we first demonstrate that the correlations detected between the two versions of the corpus are significantly higher than random, and hence that a representation based on such features does capture statistical patterns that should reflect semantic information. Then we use such representation both in cross-language and in single-language retrieval tasks, observing performance that is consistently and significantly superior to LSI on the same data.
4 0.052492727 90 nips-2002-Feature Selection in Mixture-Based Clustering
Author: Martin H. Law, Anil K. Jain, Mário Figueiredo
Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
5 0.051680207 163 nips-2002-Prediction and Semantic Association
Author: Thomas L. Griffiths, Mark Steyvers
Abstract: We explore the consequences of viewing semantic association as the result of attempting to predict the concepts likely to arise in a particular context. We argue that the success of existing accounts of semantic representation comes as a result of indirectly addressing this problem, and show that a closer correspondence to human data can be obtained by taking a probabilistic approach that explicitly models the generative structure of language. 1
6 0.047011916 75 nips-2002-Dynamical Causal Learning
7 0.045773678 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
8 0.04031134 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
9 0.039196216 18 nips-2002-Adaptation and Unsupervised Learning
10 0.03694009 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
11 0.036366645 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex
12 0.035001054 40 nips-2002-Bayesian Models of Inductive Generalization
13 0.034983616 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
14 0.034931082 154 nips-2002-Neuromorphic Bisable VLSI Synapses with Spike-Timing-Dependent Plasticity
15 0.034596492 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
16 0.034517158 115 nips-2002-Informed Projections
17 0.034419458 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
18 0.033207323 43 nips-2002-Binary Coding in Auditory Cortex
19 0.032171346 111 nips-2002-Independent Components Analysis through Product Density Estimation
20 0.031746805 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
topicId topicWeight
[(0, -0.113), (1, 0.03), (2, -0.0), (3, -0.004), (4, -0.059), (5, 0.035), (6, -0.005), (7, -0.061), (8, 0.027), (9, -0.061), (10, -0.08), (11, -0.011), (12, 0.046), (13, 0.01), (14, -0.009), (15, -0.01), (16, -0.014), (17, -0.004), (18, 0.05), (19, -0.044), (20, -0.039), (21, -0.035), (22, 0.092), (23, -0.003), (24, -0.049), (25, -0.019), (26, 0.057), (27, -0.002), (28, 0.096), (29, 0.051), (30, 0.032), (31, -0.027), (32, -0.008), (33, -0.152), (34, 0.02), (35, -0.064), (36, 0.169), (37, -0.129), (38, -0.001), (39, -0.073), (40, 0.122), (41, -0.057), (42, 0.069), (43, 0.029), (44, 0.199), (45, 0.21), (46, 0.043), (47, -0.036), (48, -0.055), (49, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.93389034 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
Author: Kenneth J. Malmberg, René Zeelenberg, Richard M. Shiffrin
Abstract: The benz.odiaze:pine '~1idazolam causes dense,but temporary ~ anterograde amnesia, similar to that produced by- hippocampal damage~Does the action of M'idazola:m on the hippocanlpus cause less storage, or less accurate storage, .of information in episodic. long-term menlory?- \rVe used a sinlple variant of theREJv1. JD.odel [18] to fit data collected. by IIirsbnla.n~Fisher, .IIenthorn,Arndt} and Passa.nnante [9] on the effects of Midazola.m, study time~ and normative \vQrd.. frequenc:y on both yes-no and remember-k.novv recognition m.emory. That a: simple strength. 'model fit well \\tas cont.rary to the expectations of 'flirshman et aLMore important,within the Bayesian based R.EM modeling frame\vork, the data were consistentw'ith the view that Midazolam causes less accurate storage~ rather than less storage, of infornlation in episodic mcm.ory..
2 0.81332582 176 nips-2002-Replay, Repair and Consolidation
Author: Szabolcs Káli, Peter Dayan
Abstract: A standard view of memory consolidation is that episodes are stored temporarily in the hippocampus, and are transferred to the neocortex through replay. Various recent experimental challenges to the idea of transfer, particularly for human memory, are forcing its re-evaluation. However, although there is independent neurophysiological evidence for replay, short of transfer, there are few theoretical ideas for what it might be doing. We suggest and demonstrate two important computational roles associated with neocortical indices.
3 0.46924153 163 nips-2002-Prediction and Semantic Association
Author: Thomas L. Griffiths, Mark Steyvers
Abstract: We explore the consequences of viewing semantic association as the result of attempting to predict the concepts likely to arise in a particular context. We argue that the success of existing accounts of semantic representation comes as a result of indirectly addressing this problem, and show that a closer correspondence to human data can be obtained by taking a probabilistic approach that explicitly models the generative structure of language. 1
4 0.44278556 15 nips-2002-A Probabilistic Model for Learning Concatenative Morphology
Author: Matthew G. Snover, Michael R. Brent
Abstract: This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and hill-climbing and directed search algorithms. By extracting and examining morphologically rich subsets of an input lexicon, the directed search identifies highly productive paradigms. The hill-climbing algorithm then further maximizes the probability of the hypothesis. Quantitative results are shown by measuring the accuracy of the morphological relations identified. Experiments in English and Polish, as well as comparisons with another recent unsupervised morphology learning algorithm demonstrate the effectiveness of this technique.
5 0.42315257 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex
Author: Peter Dayan, Angela J. Yu
Abstract: Inference and adaptation in noisy and changing, rich sensory environments are rife with a variety of specific sorts of variability. Experimental and theoretical studies suggest that these different forms of variability play different behavioral, neural and computational roles, and may be reported by different (notably neuromodulatory) systems. Here, we refine our previous theory of acetylcholine’s role in cortical inference in the (oxymoronic) terms of expected uncertainty, and advocate a theory for norepinephrine in terms of unexpected uncertainty. We suggest that norepinephrine reports the radical divergence of bottom-up inputs from prevailing top-down interpretations, to influence inference and plasticity. We illustrate this proposal using an adaptive factor analysis model.
6 0.40000844 18 nips-2002-Adaptation and Unsupervised Learning
7 0.34326923 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
8 0.32970798 111 nips-2002-Independent Components Analysis through Product Density Estimation
9 0.32262754 178 nips-2002-Robust Novelty Detection with Single-Class MPM
10 0.3000069 188 nips-2002-Stability-Based Model Selection
11 0.29517928 125 nips-2002-Learning Semantic Similarity
12 0.27071062 89 nips-2002-Feature Selection by Maximum Marginal Diversity
13 0.24332029 179 nips-2002-Scaling of Probability-Based Optimization Algorithms
14 0.2420011 177 nips-2002-Retinal Processing Emulation in a Programmable 2-Layer Analog Array Processor CMOS Chip
15 0.23602185 115 nips-2002-Informed Projections
16 0.23114184 99 nips-2002-Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA
17 0.22697976 35 nips-2002-Automatic Acquisition and Efficient Representation of Syntactic Structures
18 0.2198696 90 nips-2002-Feature Selection in Mixture-Based Clustering
19 0.21136731 138 nips-2002-Manifold Parzen Windows
20 0.20141654 42 nips-2002-Bias-Optimal Incremental Problem Solving
topicId topicWeight
[(11, 0.015), (23, 0.023), (42, 0.053), (54, 0.067), (55, 0.025), (57, 0.015), (64, 0.01), (67, 0.027), (68, 0.02), (74, 0.077), (79, 0.487), (92, 0.015), (98, 0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.80262321 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
Author: Kenneth J. Malmberg, René Zeelenberg, Richard M. Shiffrin
Abstract: The benz.odiaze:pine '~1idazolam causes dense,but temporary ~ anterograde amnesia, similar to that produced by- hippocampal damage~Does the action of M'idazola:m on the hippocanlpus cause less storage, or less accurate storage, .of information in episodic. long-term menlory?- \rVe used a sinlple variant of theREJv1. JD.odel [18] to fit data collected. by IIirsbnla.n~Fisher, .IIenthorn,Arndt} and Passa.nnante [9] on the effects of Midazola.m, study time~ and normative \vQrd.. frequenc:y on both yes-no and remember-k.novv recognition m.emory. That a: simple strength. 'model fit well \\tas cont.rary to the expectations of 'flirshman et aLMore important,within the Bayesian based R.EM modeling frame\vork, the data were consistentw'ith the view that Midazolam causes less accurate storage~ rather than less storage, of infornlation in episodic mcm.ory..
2 0.67803389 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Author: Gianluca Pollastri, Pierre Baldi, Alessandro Vullo, Paolo Frasconi
Abstract: We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse- or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance. 1 Introduction: Protein Topology Prediction Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology [1]. Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction [4] based on the prediction of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood relationships consisting of an adjacency matrix at some distance cutoff (typically in the range of 6 to 12 ˚), or a matrix of pairwise Euclidean distances. Fine-grained maps A are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances between their Cα atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere Oi B Hi F Hi Ii Figure 1: Bayesian network for bidirectional IOHMMs consisting of input units, output units, and both forward and backward Markov chains of hidden states. [14] using distance geometry and stochastic optimization techniques. Thus the main focus here is on the more difficult task of contact map prediction. Various algorithms for the prediction of contact maps have been developed, in particular using feedforward neural networks [6]. The best contact map predictor in the literature and at the last CASP prediction experiment reports an average precision [True Positives/(True Positives + False Positives)] of 21% for distant contacts, i.e. with a linear distance of 8 amino acid or more [6] for fine-grained amino acid maps. While this result is encouraging and well above chance level by a factor greater than 6, it is still far from providing sufficient accuracy for reliable 3D structure prediction. A key issue in this area is the amount of noise that can be tolerated in a contact map prediction without compromising the 3D-reconstruction step. While systematic tests in this area have not yet been published, preliminary results appear to indicate that recovery of as little as half of the distant contacts may suffice for proper reconstruction, at least for proteins up to 150 amino acid long (Rita Casadio and Piero Fariselli, private communication and oral presentation during CASP4 [10]). It is important to realize that the input to a fine-grained contact map predictor need not be confined to the sequence of amino acids only, but may also include evolutionary information in the form of profiles derived by multiple alignment of homologue proteins, or structural feature information, such as secondary structure (alpha helices, beta strands, and coils), or solvent accessibility (surface/buried), derived by specialized predictors [12, 13]. In our approach, we use different GIOHMM and GRNN strategies to predict both structural features and contact maps. 2 GIOHMM Architectures Loosely speaking, GIOHMMs are Bayesian networks with input, hidden, and output units that can be used to process complex data structures such as sequences, images, trees, chemical compounds and so forth, built on work in, for instance, [5, 3, 7, 2, 11]. In general, the connectivity of the graphs associated with the hidden units matches the structure of the data being processed. Often multiple copies of the same hidden graph, but with different edge orientations, are used in the hidden layers to allow direct propagation of information in all relevant directions. Output Plane NE NW 4 Hidden Planes SW SE Input Plane Figure 2: 2D GIOHMM Bayesian network for processing two-dimensional objects such as contact maps, with nodes regularly arranged in one input plane, one output plane, and four hidden planes. In each hidden plane, nodes are arranged on a square lattice, and all edges are oriented towards the corresponding cardinal corner. Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. To illustrate the general idea, a first example of GIOHMM is provided by the bidirectional IOHMMs (Figure 1) introduced in [2] to process sequences and predict protein structural features, such as secondary structure. Unlike standard HMMs or IOHMMS used, for instance in speech recognition, this architecture is based on two hidden markov chains running in opposite directions to leverage the fact that biological sequences are spatial objects rather than temporal sequences. Bidirectional IOHMMs have been used to derive a suite of structural feature predictors [12, 13, 4] available through http://promoter.ics.uci.edu/BRNN-PRED/. These predictors have accuracy rates in the 75-80% range on a per amino acid basis. 2.1 Direct Prediction of Topology To predict contact maps, we use a 2D generalization of the previous 1D Bayesian network. The basic version of this architecture (Figures 2) contains 6 layers of units: input, output, and four hidden layers, one for each cardinal corner. Within each column indexed by i and j, connections run from the input to the four hidden units, and from the four hidden units to the output unit. In addition, the hidden units in each hidden layer are arranged on a square or triangular lattice, with all the edges oriented towards the corresponding cardinal corner. Thus the parameters of this two-dimensional GIOHMMs, in the square lattice case, are the conditional probability distributions: NE NW SW SE P (Oi |Ii,j , Hi,j , Hi,j , Hi,j , Hi,j, ) NE NE NE P (Hi,j |Ii,j , Hi−1,j , Hi,j−1 ) N NW NW P (Hi,jW |Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW P (Hi,j |Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE P (Hi,j |Ii,j , Hi−1,j , Hi,j+1 ) (1) In a contact map prediction at the amino acid level, for instance, the (i, j) output represents the probability of whether amino acids i and j are in contact or not. This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. In the simulations reported below, we use a more elaborated input consisting of a 20 × 20 probability matrix over amino acid pairs derived from a multiple alignment of the given protein sequence and its homologues, as well as the structural features of the corresponding amino acids, including their secondary structure classification and their relative exposure to the solvent, derived from our corresponding predictors. It should be clear how GIOHMM ideas can be generalized to other data structures and problems in many ways. In the case of 3D data, for instance, a standard GIOHMM would have an input cube, an output cube, and up to 8 cubes of hidden units, one for each corner with connections inside each hidden cube oriented towards the corresponding corner. In the case of data with an underlying tree structure, the hidden layers would correspond to copies of the same tree with different orientations and so forth. Thus a fundamental advantage of GIOHMMs is that they can process a wide range of data structures of variable sizes and dimensions. 2.2 Indirect Prediction of Topology Although GIOHMMs allow flexible integration of contextual information over ranges that often exceed what can be achieved, for instance, with fixed-input neural networks, the models described above still suffer from the fact that the connections remain local and therefore long-ranged propagation of information during learning remains difficult. Introduction of large numbers of long-ranged connections is computationally intractable but in principle not necessary since the number of contacts in proteins is known to grow linearly with the length of the protein, and hence connectivity is inherently sparse. The difficulty of course is that the location of the long-ranged contacts is not known. To address this problem, we have developed also a complementary GIOHMM approach described in Figure 3 where a candidate graph structure is proposed in the hidden layers of the GIOHMM, with the two different orientations naturally associated with a protein sequence. Thus the hidden graphs change with each protein. In principle the output ought to be a single unit (Figure 3b) which directly computes a global score for the candidate structure presented in the hidden layer. In order to cope with long-ranged dependencies, however, it is preferable to compute a set of local scores (Figure 3c), one for each vertex, and combine the local scores into a global score by averaging. More specifically, consider a true topology represented by the undirected contact graph G∗ = (V, E ∗ ), and a candidate undirected prediction graph G = (V, E). A global measure of how well E approximates E ∗ is provided by the informationretrieval F1 score defined by the normalized edge-overlap F1 = 2|E ∩ E ∗ |/(|E| + |E ∗ |) = 2P R/(P + R), where P = |E ∩ E ∗ |/|E| is the precision (or specificity) and R = |E ∩ E ∗ |/|E ∗ | is the recall (or sensitivity) measure. Obviously, 0 ≤ F1 ≤ 1 and F1 = 1 if and only if E = E ∗ . The scoring function F1 has the property of being monotone in the sense that if |E| = |E | then F1 (E) < F1 (E ) if and only if |E ∩ E ∗ | < |E ∩ E ∗ |. Furthermore, if E = E ∪ {e} where e is an edge in E ∗ but not in E, then F1 (E ) > F1 (E). Monotonicity is important to guide the search in the space of possible topologies. It is easy to check that a simple search algorithm based on F1 takes on the order of O(|V |3 ) steps to find E ∗ , basically by trying all possible edges one after the other. The problem then is to learn F1 , or rather a good approximation to F1 . To approximate F1 , we first consider a similar local measure Fv by considering the O I(v) I(v) F B H (v) H (v) (a) I(v) F B H (v) H (v) (b) O(v) (c) Figure 3: Indirect prediction of contact maps. (a) target contact graph to be predicted. (b) GIOHMM with two hidden layers: the two hidden layers correspond to two copies of the same candidate graph oriented in opposite directions from one end of the protein to the other end. The single output O is the global score of how well the candidate graph approximates the true contact map. (c) Similar to (b) but with a local score O(v) at each vertex. The local scores can be averaged to produce a global score. In (b) and (c) I(v) represents the input for vertex v, and H F (v) and H B (v) are the corresponding hidden variables. ∗ ∗ set Ev of edges adjacent to vertex v and Fv = 2|Ev ∩ Ev |/(|Ev | + |Ev |) with the ¯ global average F = v Fv /|V |. If n and n∗ are the average degrees of G and G∗ , it can be shown that: F1 = 1 |V | v 2|Ev ∩ E ∗ | n + n∗ and 1 ¯ F = |V | v 2|Ev ∩ E ∗ | n + v + n∗ + ∗ v (2) where n + v (resp. n∗ + ∗ ) is the degree of v in G (resp. in G∗ ). In particular, if G v ¯ ¯ and G∗ are regular graphs, then F1 (E) = F (E) so that F is a good approximation to F1 . In the contact map regime where the number of contacts grows linearly with the length of the sequence, we should have in general |E| ≈ |E ∗ | ≈ (1 + α)|V | so that each node on average has n = n∗ = 2(1 + α) edges. The value of α depends of course on the neighborhood cutoff. As in reinforcement learning, to learn the scoring function one is faced with the problem of generating good training sets in a high dimensional space, where the states are the topologies (graphs), and the policies are algorithms for adding a single edge to a given graph. In the simulations we adopt several different strategies including static and dynamic generation. Within dynamic generation we use three exploration strategies: random exploration (successor graph chosen at random), pure exploitation (successor graph maximizes the current scoring function), and semi-uniform exploitation to find a balance between exploration and exploitation [with probability (resp. 1 − ) we choose random exploration (resp. pure exploitation)]. 3 GRNN Architectures Inference and learning in the protein GIOHMMs we have described is computationally intensive due to the large number of undirected loops they contain. This problem can be addressed using a neural network reparameterization assuming that: (a) all the nodes in the graphs are associated with a deterministic vector (note that in the case of the output nodes this vector can represent a probability distribution so that the overall model remains probabilistic); (b) each vector is a deterministic function of its parents; (c) each function is parameterized using a neural network (or some other class of approximators); and (d) weight-sharing or stationarity is used between similar neural networks in the model. For example, in the 2D GIOHMM contact map predictor, we can use a total of 5 neural networks to recursively compute the four hidden states and the output in each column in the form: NW NE SW SE Oij = NO (Iij , Hi,j , Hi,j , Hi,j , Hi,j ) NE NE NE Hi,j = NN E (Ii,j , Hi−1,j , Hi,j−1 ) N NW NW Hi,jW = NN W (Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW Hi,j = NSW (Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE Hi,j = NSE (Ii,j , Hi−1,j , Hi,j+1 ) (3) N In the NE plane, for instance, the boundary conditions are set to Hij E = 0 for i = 0 N or j = 0. The activity vector associated with the hidden unit Hij E depends on the NE NE local input Iij , and the activity vectors of the units Hi−1,j and Hi,j−1 . Activity in NE plane can be propagated row by row, West to East, and from the first row to the last (from South to North), or column by column South to North, and from the first column to the last. These GRNN architectures can be trained by gradient descent by unfolding the structures in space, leveraging the acyclic nature of the underlying GIOHMMs. 4 Data Many data sets are available or can be constructed for training and testing purposes, as described in the references. The data sets used in the present simulations are extracted from the publicly available Protein Data Bank (PDB) and then redundancy reduced, or from the non-homologous subset of PDB Select (ftp://ftp.emblheidelberg.de/pub/databases/). In addition, we typically exclude structures with poor resolution (less than 2.5-3 ˚), sequences containing less than 30 amino acids, A and structures containing multiple sequences or sequences with chain breaks. For coarse contact maps, we use the DSSP program [9] (CMBI version) to assign secondary structures and we remove also sequences for which DSSP crashes. The results we report for fine-grained contact maps are derived using 424 proteins with lengths in the 30-200 range for training and an additional non-homologous set of 48 proteins in the same length range for testing. For the coarse contact map, we use a set of 587 proteins of length less than 300. Because the average length of a secondary structure element is slightly above 7, the size of a coarse map is roughly 2% the size of the corresponding amino acid map. 5 Simulation Results and Conclusions We have trained several 2D GIOHMM/GRNN models on the direct prediction of fine-grained contact maps. Training of a single model typically takes on the order of a week on a fast workstation. A sample of validation results is reported in Table 1 for four different distance cutoffs. Overall percentages of correctly predicted contacts Table 1: Direct prediction of amino acid contact maps. Column 1: four distance cutoffs. Column 2, 3, and 4: overall percentages of amino acids correctly classified as contacts, non-contacts, and in total. Column 5: Precision percentage for distant contacts (|i − j| ≥ 8) with a threshold of 0.5. Single model results except for last line corresponding to an ensemble of 5 models. Cutoff 6˚ A 8˚ A 10 ˚ A 12 ˚ A 12 ˚ A Contact .714 .638 .512 .433 .445 Non-Contact .998 .998 .993 .987 .990 Total .985 .970 .931 .878 .883 Precision (P) .594 .670 .557 .549 .717 and non-contacts at all linear distances, as well as precision results for distant contacts (|i − j| ≥ 8) are reported for a single GIOHMM/GRNN model. The model has k = 14 hidden units in the hidden and output layers of the four hidden networks, as well as in the hidden layer of the output network. In the last row, we also report as an example the results obtained at 12˚ by an ensemble of 5 networks A with k = 11, 12, 13, 14 and 15. Note that precision for distant contacts exceeds all previously reported results and is well above 50%. For the prediction of coarse-grained contact maps, we use the indirect GIOHMM/GRNN strategy and compare different exploration/exploitation strategies: random exploration, pure exploitation, and their convex combination (semiuniform exploitation). In the semi-uniform case we set the probability of random uniform exploration to = 0.4. In addition, we also try a fourth hybrid strategy in which the search proceeds greedily (i.e. the best successor is chosen at each step, as in pure exploitation), but the network is trained by randomly sub-sampling the successors of the current state. Eight numerical features encode the input label of each node: one-hot encoding of secondary structure classes; normalized linear distances from the N to C terminus; average, maximum and minimum hydrophobic character of the segment (based on the Kyte-Doolittle scale with a moving window of length 7). A sample of results obtained with 5-fold cross-validation is shown in Table 2. Hidden state vectors have dimension k = 5 with no hidden layers. For each strategy we measure performances by means of several indices: micro and macroaveraged precision (mP , M P ), recall (mR, M R) and F1 measure (mF1 , M F1 ). Micro-averages are derived based on each pair of secondary structure elements in each protein, whereas macro-averages are obtained on a per-protein basis, by first computing precision and recall for each protein, and then averaging over the set of all proteins. In addition, we also measure the micro and macro averages for specificity in the sense of percentage of correct prediction for non-contacts (mP (nc), M P (nc)). Note the tradeoffs between precision and recall across the training methods, the hybrid method achieving the best F 1 results. Table 2: Indirect prediction of coarse contact maps with dynamic sampling. Strategy Random exploration Semi-uniform Pure exploitation Hybrid mP .715 .454 .431 .417 mP (nc) .769 .787 .806 .834 mR .418 .631 .726 .790 mF1 .518 .526 .539 .546 MP .767 .507 .481 .474 M P (nc) .709 .767 .793 .821 MR .469 .702 .787 .843 M F1 .574 .588 .596 .607 We have presented two approaches, based on a very general IOHMM/RNN framework, that achieve state-of-the-art performance in the prediction of proteins contact maps at fine and coarse-grained levels of resolution. In principle both methods can be applied to both resolution levels, although the indirect prediction is computationally too demanding for fine-grained prediction of large proteins. Several extensions are currently under development, including the integration of these methods into complete 3D structure predictors. While these systems require long training periods, once trained they can rapidly sift through large proteomic data sets. Acknowledgments The work of PB and GP is supported by a Laurel Wilkening Faculty Innovation award and awards from NIH, BREP, Sun Microsystems, and the California Institute for Telecommunications and Information Technology. The work of PF and AV is partially supported by a MURST grant. References [1] D. Baker and A. Sali. Protein structure prediction and structural genomics. Science, 294:93–96, 2001. [2] P. Baldi and S. Brunak and P. Frasconi and G. Soda and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11):937–946, 1999. [3] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7):1541–1565, 1996. [4] P. Baldi and G. Pollastri. Machine learning structural and functional proteomics. IEEE Intelligent Systems. Special Issue on Intelligent Systems in Biology, 17(2), 2002. [5] Y. Bengio and P. Frasconi. Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7:1231–1249, 1996. [6] P. Fariselli, O. Olmea, A. Valencia, and R. Casadio. Prediction of contact maps with neural networks and correlated mutations. Protein Engineering, 14:835–843, 2001. [7] P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9:768–786, 1998. [8] Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models Machine Learning, 29:245–273, 1997. [9] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637, 1983. [10] A. M. Lesk, L. Lo Conte, and T. J. P. Hubbard. Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins, 45, S5:98–118, 2001. [11] G. Pollastri and P. Baldi. Predition of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Proceedings of 2002 ISMB (Intelligent Systems for Molecular Biology) Conference. Bioinformatics, 18, S1:62–70, 2002. [12] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235, 2002. [13] G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153, 2002. [14] M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from contact maps. Folding and Design, 2:295–306, 1997.
3 0.55454355 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
Author: Yasemin Altun, Thomas Hofmann, Mark Johnson
Abstract: This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. The proposed method combines many of the advantages of boosting schemes with the efficiency of dynamic programming methods and is attractive both, conceptually and computationally. In addition, we also discuss alternative approaches based on the Hamming loss for label sequences. The sequence boosting algorithm offers an interesting alternative to methods based on HMMs and the more recently proposed Conditional Random Fields. Applications areas for the presented technique range from natural language processing and information extraction to computational biology. We include experiments on named entity recognition and part-of-speech tagging which demonstrate the validity and competitiveness of our approach. 1
4 0.44368446 45 nips-2002-Boosted Dyadic Kernel Discriminants
Author: Baback Moghaddam, Gregory Shakhnarovich
Abstract: We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, which provides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded off for accuracy). 1
5 0.33463934 32 nips-2002-Approximate Inference and Protein-Folding
Author: Chen Yanover, Yair Weiss
Abstract: Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). 1
6 0.3048799 53 nips-2002-Clustering with the Fisher Score
7 0.29471081 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
8 0.28591052 135 nips-2002-Learning with Multiple Labels
9 0.28580397 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
10 0.28366807 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
11 0.28237811 3 nips-2002-A Convergent Form of Approximate Policy Iteration
12 0.28231761 10 nips-2002-A Model for Learning Variance Components of Natural Images
13 0.28193671 2 nips-2002-A Bilinear Model for Sparse Coding
14 0.28118315 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
15 0.28112045 163 nips-2002-Prediction and Semantic Association
16 0.28091151 41 nips-2002-Bayesian Monte Carlo
17 0.28059709 124 nips-2002-Learning Graphical Models with Mercer Kernels
18 0.28045461 74 nips-2002-Dynamic Structure Super-Resolution
19 0.28013819 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
20 0.27928731 176 nips-2002-Replay, Repair and Consolidation