nips nips2004 nips2004-6 knowledge-graph by maker-knowledge-mining

6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

Source: pdf

Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann

Abstract: De novo Sequencing of peptides is a challenging task in proteome research. While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperforms standard approaches in matching quality. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 of Theoretical Computer Science ETH Zurich CH-8092 Zurich, Switzerland Abstract De novo Sequencing of peptides is a challenging task in proteome research. [sent-6, score-0.361]

2 While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. [sent-7, score-0.946]

3 Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. [sent-8, score-0.492]

4 In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. [sent-9, score-0.847]

5 1 Introduction The goal of de novo peptide sequencing is to reconstruct an amino acid sequence from a given mass spectrum. [sent-11, score-1.886]

6 De novo sequencing by means of mass spectrometry is a very challenging task, since many practical problems like measurement errors or peak suppression have to be overcome. [sent-12, score-0.986]

7 It is, thus, not surprising that current approaches to reconstruct the sequence from mass spectra are usually limited to those species for which genome information is available. [sent-13, score-0.541]

8 This case is a simpliﬁed problem of the de novo sequencing problem, since the hypothesis space of possible sequences is restricted to the known ones contained in a sequence database. [sent-14, score-0.532]

9 In this paper we present a Hidden Markov Model (HMM) for de novo sequencing. [sent-15, score-0.271]

10 Our trained HMM deﬁnes a generative model for mass spectra which, for instance, is used for scoring observed spectra according to their likelihood given a peptide sequence. [sent-17, score-0.944]

11 2 Tandem Mass Spectrometry In a typical sequencing experiment by mass spectrometry a protein is digested with the help of an enzyme. [sent-19, score-0.701]

12 This digestion reaction breaks the protein into several peptides, each of which consists of a short sequence of typically 10 to 20 amino acid residues, with an additional H-atom at the N-terminus and an OH-group at the C-terminus. [sent-20, score-0.728]

13 MS m/z Figure 1: In the ﬁrst mass measurement the parent mass is selected. [sent-23, score-1.007]

14 In the second measurement the peptide is dissociated and the mass of the ion fragments is measured. [sent-24, score-1.108]

15 There are two measurement steps in a tandem mass spectrometer. [sent-25, score-0.509]

16 The ﬁrst step is responsible for ﬁltering peptides of a certain total mass (also called the parent mass). [sent-26, score-0.678]

17 The difﬁculty in measuring the parent mass arises from different 12 C/13 C isotope proportions of the approximately 30-80 C-atoms contained in a peptide. [sent-27, score-0.633]

18 Fluctuations of the 13 C fraction result in a binomial distribution of parent masses in the measurement. [sent-28, score-0.273]

19 Given such an “ion count distribution” one can roughly estimate the mono-isotopic parent mass of the peptide, where the term mono-isotopic here refers to a peptide that contains exclusively 12 C atoms. [sent-29, score-0.986]

20 In practice, all isotope conﬁgurations of a peptide with parent masses that do not exceed the estimated mono-isotopic mass by more than a predeﬁned offset are separated from the other peptides and passed to the second spectrometer. [sent-30, score-1.149]

21 Figure 2: Top: The ideal peaks of a peptide sequence are drawn. [sent-31, score-0.529]

22 In the second mass measurement, a peptide is split into two fragments by means of collision induced dissociation with a noble gas. [sent-33, score-0.933]

23 In almost all cases the peptide is broken between two amino acids. [sent-34, score-0.733]

24 caused by problems in determining the exact mono-isotopic mass of the fragments due to isotope shifts. [sent-38, score-0.576]

25 3 The Hidden Markov Model for de Novo Peptide Sequencing A peptide can formally be described as a sequence of symbols from a ﬁxed alphabet A of 20 amino acids. [sent-42, score-0.779]

26 We will denote amino acids with α ∈ A and the mass of an amino acid with M (α). [sent-43, score-1.406]

27 The input data is a spectrum of ion counts over all mass units. [sent-44, score-0.658]

28 The spectra are discretized to approximately one Dalton mass units and normalized such that the mean ion count per Dalton is constant. [sent-46, score-0.728]

29 The mono-isotopic parent mass mp of the peptide P = (α1 , . [sent-47, score-1.232]

30 , αn ) with αi ∈ A is the sum of all amino acid masses plus a constant mass for the N- and C-termini. [sent-50, score-1.089]

31 For the sake of simplicity it is assumed that the N- and C-termini are not present and thus the parent mass considered in the sequel is n (1) mp = i=1 M (αi ) . [sent-52, score-0.878]

32 The physical process that generates spectra is based on the fact that a peptide is randomly broken into two parts by interaction with a noble gas. [sent-54, score-0.488]

33 Each of these parts is detected in the mass spectrometer and increases the ion-count in the corresponding mass interval. [sent-55, score-0.819]

34 In order to derive a model of the generation process, we make the simplifying assumptions that (i) breaks occur only at amino acid boundaries, and (ii) the probability of observing a break after a certain amino acid depends only on the amino acid itself. [sent-57, score-1.907]

35 In such a model, the process of generating a spectrum for a peptide of known parent mass is formalized as a path through the automaton in 1 Dalton steps until the constraint on the parent mass is satisﬁed. [sent-59, score-1.718]

36 For each amino acid α there is a list of M (α) states. [sent-62, score-0.668]

37 For each amino acid α ∈ A there exists a list of M (α) states sα , . [sent-65, score-0.715]

38 (2) j The bold edges in the graph correspond to state transition probabilities a(s, t) from state s to state t. [sent-70, score-0.328]

39 Once the automation is in the ﬁrst state sα of a state list of one amino acid α, it 1 has to pass through all other states within the speciﬁc list. [sent-71, score-0.873]

40 Thus for the next M (α) steps the list corresponding to amino acid α is linearly traversed. [sent-72, score-0.668]

41 If the automaton is in the last state sα (α) of a list, it can reach the start states sα of any other amino acid α . [sent-73, score-0.877]

42 (3) The ﬁrst row (a(s, t) = 1) describes the case where the automaton is in a non-terminating state of a list of amino acid α (1 ≤ i < M (α) : s = sα ), where the following state is i accepted with probability 1. [sent-79, score-0.947]

43 In such a case, the starting state of any other amino acid is selected with probability rα . [sent-81, score-0.709]

44 The probabilities rα are the probabilities of occurrence of amino acid α. [sent-82, score-0.753]

45 The transition probabilities a(s0 , t) from the start state s0 are the occurrence probabilities of the amino acids. [sent-83, score-0.603]

46 rα ∀ α ∈ A : t = s α 1 (4) a(s0 , t) = 0 else Finally one has to ensure that the parent mass constraint is fulﬁlled. [sent-84, score-0.632]

47 In order to satisfy the constraint we device a time dependent hidden Markov model in which the transition probability changes with a heavy side function at time mp from a(s, t) to a (s, t). [sent-85, score-0.391]

48   1 ∀ α ∈ A : s = sα (α) , t = s+ M a (s, t) = (5) 1 ∀ α ∈ A, 1 ≤ i < M (α) : s = sα , t = s− i  0 else If the automaton is in the last state sα (α) of an amino acid state list, it changes to the M positive end state s+ with probability 1 since the parent mass constraint is satisﬁed. [sent-87, score-1.62]

49 If the automaton is in one of the other states, it changes to the negative end state s− since the parent mass constraint is violated. [sent-88, score-0.805]

50 It is important to realize that all amino acid sequences that fulﬁll the parent mass constraint can be transformed into state sequences that end in the positive state s+ and vice versa. [sent-89, score-1.461]

51 02 2H2O −30 −20 (a) −10 0 10 0 0 −50 −50 −40 −40 −30 −30 −20 −20 −10 −10 0 0 10 10 (b) Figure 4: Mean height of ion counts for different shifts with respect to the ideal preﬁx fragments (a) and sufﬁx fragments (b). [sent-118, score-0.531]

52 At each state of the ﬁnite state automaton an ion count value is emitted. [sent-119, score-0.514]

53 Figure 4 shows the mean ion count for different positions relative to the amino acid bound averaged over all amino acids. [sent-120, score-1.225]

54 It happens quite frequently that an amino acid looses water (H 2 O) or ammonia (NH3 ). [sent-122, score-0.682]

55 The ion count patterns for the preﬁx fragments (ﬁg. [sent-123, score-0.379]

56 Sufﬁx fragments are more stable than preﬁx fragments: the central peak at position 0 (amino acid boundary) is three times higher for the sufﬁx fragments than for the preﬁx fragments. [sent-127, score-0.603]

57 The Markov chain models a sequence with three amino acids. [sent-130, score-0.429]

58 The ﬁlled circles correspond to the amino acid boundaries. [sent-131, score-0.63]

59 Each amino acid bound generates an ion count pattern for the preﬁx fragment and one for the sufﬁx fragment. [sent-132, score-0.865]

60 Breaking a peptide in the second mass spectrometer produces both a preﬁx and a sufﬁx fragment. [sent-133, score-0.786]

61 5) it is sufﬁcient to limit the length of both models to mp /2. [sent-136, score-0.29]

62 The forward and the backward Markov chains are extended to hidden Markov models to describe the ion counts in the mass spectra. [sent-139, score-0.746]

63 The emission probabilities depend on the two states of the preﬁx and sufﬁx sequence, since these states give rise to ion counts in the measurements. [sent-140, score-0.465]

64 We deﬁne ¯ ¯ bs,s (xm ) = P Xm = xm = (x(m), x(mp − m)) | Ym = (s, s ) (6) as the emission probabilities of ion counts. [sent-141, score-0.399]

65 This notion of coupled variables Xm describes the transition from two independent Markov chains to one coupled hidden Markov model with a squared number of states (2-tuple states). [sent-144, score-0.246]

66 In each term of the product, two peaks are observed on both sides of the spectrum: one at position m and the other at the mirror position mp − m. [sent-148, score-0.461]

67 The two chains are connected by the transition probability a(y(mp −1)/2 , y(mp −1)/2+1 ) of traversing from the last state of the forward Markov chain to the ﬁrst state of the backward chain. [sent-151, score-0.315]

68 3 Most Probable Sequence The input spectrum usually comes with an estimate of the parent mass with a tolerance of about 1 Dalton. [sent-153, score-0.638]

69 Using a maximum likelihood approach the parent mass estimate is P {X = x, Y = y | s+ , mp } . [sent-154, score-0.878]

70 (8) mp = argmax P {X = x | s+ , mp } = argmax ˆ mp mp y The sum over all sequences can be computed efﬁciently by dynamic programming using the forward algorithm. [sent-155, score-1.302]

71 One result of de novo peptide sequencing is the computation of the best sequence generating a given spectrum. [sent-156, score-0.852]

72 Given the estimated parent mass mp the maximum posterior estimate ˆ of the sequence is y ∗ = argmax P {Y = y | X = x, s+ , mp } = argmax P {X = x, Y = y | s+ , mp } . [sent-157, score-1.58]

73 To compute the posterior probability one has to normalize the joint probability P {X = x, Y = y | s+ , mp } by ˆ the evidence P {X = x | s+ , mp } using the forward-backward algorithm. [sent-159, score-0.597]

74 ˆ In the mass spectra ions with very low mass or almost parent mass are less frequently observed than ions with a medium mass. [sent-160, score-1.511]

75 It is also possible to give a score for each subsequence of the peptide, especially a score for each amino acid. [sent-162, score-0.381]

76 , y mp , x | s + , m p P {x | s+ , mp } (10) (11) This can again be computed by some variation of the forward and backward algorithm. [sent-182, score-0.652]

77 4 Simpliﬁcation of the Model The coupled hidden Markov model has 2 3752 = 5 640 625 states that leads to a runtime of 20 minutes per peptide which for practical applications is problematic. [sent-184, score-0.49]

78 Thus the emission of mirror peaks x(mp −m) is deterministically coupled to the emission of the peak xm . [sent-188, score-0.52]

79 The mass spectrometer gave an output of 7056 different candidate spectra. [sent-191, score-0.432]

80 It was shown that the PeptideProphet score is a very reliable scoring method for peptide identiﬁcation by database search. [sent-193, score-0.433]

81 The emission of symbols is coupled with the amino acid bounds of the preﬁx sequence. [sent-196, score-0.793]

82 quality of the HMM inference is measured by the ratio of common amino acid boundaries and the number of amino acids in the database sequence. [sent-197, score-1.054]

83 The performance of the HMM was tested by leave-one-out cross validation: in each training step the emission probabilities and the amino acid occurrence probabilities are re-estimated, with one sequence excluded from the training set. [sent-198, score-0.901]

84 To estimate the emission probabilities, the ion count is discretized to a ﬁxed number of bins, in such a way that all bins contain an equal number of counts. [sent-199, score-0.396]

85 We have chosen the prominent de novo sequencing programs LUTEFISK [6] and PEAKS [5] as competitors for the simpliﬁed HMM. [sent-214, score-0.476]

86 In ﬁgure 8 a) the estimated parent masses compared to the database parent mass is drawn. [sent-216, score-0.896]

87 The plot demonstrates that all de novo sequencing methods tend to overestimate the parent mass. [sent-217, score-0.651]

88 5 Conclusion and Further Work A novel method for the analysis of mass spectra in de novo peptide sequencing is presented in this paper. [sent-229, score-1.28]

89 The proposed hidden Markov model is a fully probabilistic model for the generation process of mass spectra. [sent-230, score-0.43]

90 The model was tested on mass spectra from vacuola 1 1 1 0. [sent-231, score-0.476]

91 3 0 −3 −2 −1 0 1 2 3 HMM Lutefisk Peaks HMM (b) Figure 8: a) Histogram on difference of estimated parent mass and database output. [sent-267, score-0.623]

92 The HMM clearly outperforms its competitors in recognition of the parent mass and peak localization. [sent-270, score-0.659]

93 In further work additional model parameters will be introduced to represent and to detect amino acids with post-translational modiﬁcations. [sent-271, score-0.389]

94 Our method shows a large potential for high throughput de novo sequencing of proteins which is unmatched by competing techniques. [sent-273, score-0.468]

95 Audens: A tool for automatic de novo a u peptide sequencing. [sent-276, score-0.625]

96 A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. [sent-281, score-1.281]

97 An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. [sent-287, score-1.264]

98 Empirical statistical model to estimate the accuracy of peptide identiﬁcations made by MS/MS and database search. [sent-291, score-0.389]

99 Peaks: Powerful software for peptide de novo sequencing by tandem mass spectrometry. [sent-294, score-1.281]

100 Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. [sent-299, score-1.281]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mass', 0.387), ('amino', 0.36), ('peptide', 0.354), ('mp', 0.29), ('novo', 0.271), ('acid', 0.27), ('parent', 0.201), ('ion', 0.191), ('sequencing', 0.179), ('fragments', 0.144), ('automaton', 0.121), ('pre', 0.115), ('hmm', 0.111), ('peaks', 0.105), ('emission', 0.1), ('peptides', 0.09), ('tandem', 0.09), ('ymp', 0.09), ('spectra', 0.089), ('state', 0.079), ('zurich', 0.078), ('masses', 0.072), ('spectrometry', 0.072), ('markov', 0.067), ('mirror', 0.066), ('dalton', 0.06), ('lutefisk', 0.06), ('quartile', 0.06), ('xm', 0.058), ('nh', 0.052), ('probabilities', 0.05), ('spectrum', 0.05), ('sequence', 0.048), ('states', 0.047), ('coupled', 0.046), ('isotope', 0.045), ('spectrometer', 0.045), ('eth', 0.045), ('peak', 0.045), ('bins', 0.044), ('count', 0.044), ('hidden', 0.043), ('transition', 0.041), ('ym', 0.04), ('backward', 0.038), ('list', 0.038), ('argmax', 0.037), ('yq', 0.036), ('sa', 0.035), ('database', 0.035), ('sv', 0.034), ('forward', 0.034), ('sequences', 0.034), ('protein', 0.033), ('measurement', 0.032), ('switzerland', 0.032), ('suf', 0.03), ('recall', 0.03), ('ammonia', 0.03), ('aysardgfshek', 0.03), ('baginsky', 0.03), ('bym', 0.03), ('digested', 0.03), ('gruissem', 0.03), ('ions', 0.03), ('peptideprophet', 0.03), ('sacha', 0.03), ('wilhelm', 0.03), ('yp', 0.03), ('simpli', 0.03), ('counts', 0.03), ('acids', 0.029), ('else', 0.027), ('competitors', 0.026), ('noble', 0.026), ('plant', 0.026), ('median', 0.025), ('scoring', 0.025), ('chemistry', 0.024), ('occurrence', 0.023), ('chains', 0.023), ('collision', 0.022), ('water', 0.022), ('ideal', 0.022), ('chain', 0.021), ('subsequence', 0.021), ('gure', 0.019), ('broken', 0.019), ('reliable', 0.019), ('proteins', 0.018), ('joint', 0.017), ('ful', 0.017), ('co', 0.017), ('discretization', 0.017), ('symmetry', 0.017), ('constraint', 0.017), ('breaks', 0.017), ('discretized', 0.017), ('reconstruct', 0.017), ('symbols', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann

2 0.094317809 52 nips-2004-Discrete profile alignment via constrained information bottleneck

Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

Abstract: Amino acid proﬁles, which capture position-speciﬁc mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, proﬁle comparisons are much more computationally expensive than discrete symbol comparisons, making proﬁles impractical for many large datasets. Furthermore, because they are such a rich representation, proﬁles can be diﬃcult to visualize. To overcome these problems, we propose a discretization for proﬁles using an expanded alphabet representing not just individual amino acids, but common proﬁles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we ﬁnd an informationally optimal alphabet. This discretization yields a concise, informative textual representation for proﬁle sequences. Also alignments between these sequences, while nearly as accurate as the full proﬁleproﬁle alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using proﬁles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which proﬁle information can be applied. 1

3 0.064923033 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps

Author: Frank Dimaio, George Phillips, Jude W. Shavlik

Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use

4 0.062906057 124 nips-2004-Multiple Alignment of Continuous Time Series

Author: Jennifer Listgarten, Radford M. Neal, Sam T. Roweis, Andrew Emili

Abstract: Multiple realizations of continuous-valued time series from a stochastic process often contain systematic variations in rate and amplitude. To leverage the information contained in such noisy replicate sets, we need to align them in an appropriate way (for example, to allow the data to be properly combined by adaptive averaging). We present the Continuous Proﬁle Model (CPM), a generative model in which each observed time series is a non-uniformly subsampled version of a single latent trace, to which local rescaling and additive noise are applied. After unsupervised training, the learned trace represents a canonical, high resolution fusion of all the replicates. As well, an alignment in time and scale of each observation to this trace can be found by inference in the model. We apply CPM to successfully align speech signals from multiple speakers and sets of Liquid Chromatography-Mass Spectrometry proteomic data. 1 A Proﬁle Model for Continuous Data When observing multiple time series generated by a noisy, stochastic process, large systematic sources of variability are often present. For example, within a set of nominally replicate time series, the time axes can be variously shifted, compressed and expanded, in complex, non-linear ways. Additionally, in some circumstances, the scale of the measured data can vary systematically from one replicate to the next, and even within a given replicate. We propose a Continuous Proﬁle Model (CPM) for simultaneously analyzing a set of such time series. In this model, each time series is generated as a noisy transformation of a single latent trace. The latent trace is an underlying, noiseless representation of the set of replicated, observable time series. Output time series are generated from this model by moving through a sequence of hidden states in a Markovian manner and emitting an observable value at each step, as in an HMM. Each hidden state corresponds to a particular location in the latent trace, and the emitted value from the state depends on the value of the latent trace at that position. To account for changes in the amplitude of the signals across and within replicates, the latent time states are augmented by a set of scale states, which control how the emission signal will be scaled relative to the value of the latent trace. During training, the latent trace is learned, as well as the transition probabilities controlling the Markovian evolution of the scale and time states and the overall noise level of the observed data. After training, the latent trace learned by the model represents a higher resolution ’fusion’ of the experimental replicates. Figure 1 illustrate the model in action. Unaligned, Linear Warp Alignment and CPM Alignment Amplitude 40 30 20 10 0 50 Amplitude 40 30 20 10 Amplitude 0 30 20 10 0 Time a) b) Figure 1: a) Top: ten replicated speech energy signals as described in Section 4), Middle: same signals, aligned using a linear warp with an offset, Bottom: aligned with CPM (the learned latent trace is also shown in cyan). b) Speech waveforms corresponding to energy signals in a), Top: unaligned originals, Bottom: aligned using CPM. 2 Deﬁning the Continuous Proﬁle Model (CPM) The CPM is generative model for a set of K time series, xk = (xk , xk , ..., xk k ). The 1 2 N temporal sampling rate within each xk need not be uniform, nor must it be the same across the different xk . Constraints on the variability of the sampling rate are discussed at the end of this section. For notational convenience, we henceforth assume N k = N for all k, but this is not a requirement of the model. The CPM is set up as follows: We assume that there is a latent trace, z = (z1 , z2 , ..., zM ), a canonical representation of the set of noisy input replicate time series. Any given observed time series in the set is modeled as a non-uniformly subsampled version of the latent trace to which local scale transformations have been applied. Ideally, M would be inﬁnite, or at least very large relative to N so that any experimental data could be mapped precisely to the correct underlying trace point. Aside from the computational impracticalities this would pose, great care to avoid overﬁtting would have to be taken. Thus in practice, we have used M = (2 + )N (double the resolution, plus some slack on each end) in our experiments and found this to be sufﬁcient with < 0.2. Because the resolution of the latent trace is higher than that of the observed time series, experimental time can be made effectively to speed up or slow down by advancing along the latent trace in larger or smaller jumps. The subsampling and local scaling used during the generation of each observed time series are determined by a sequence of hidden state variables. Let the state sequence for observation k be π k . Each state in the state sequence maps to a time state/scale state pair: k πi → {τik , φk }. Time states belong to the integer set (1..M ); scale states belong to an i ordered set (φ1 ..φQ ). (In our experiments we have used Q=7, evenly spaced scales in k logarithmic space). States, πi , and observation values, xk , are related by the emission i k probability distribution: Aπi (xk |z) ≡ p(xk |πi , z, σ, uk ) ≡ N (xk ; zτik φk uk , σ), where σ k i i i i is the noise level of the observed data, N (a; b, c) denotes a Gaussian probability density for a with mean b and standard deviation c. The uk are real-valued scale parameters, one per observed time series, that correct for any overall scale difference between time series k and the latent trace. To fully specify our model we also need to deﬁne the state transition probabilities. We deﬁne the transitions between time states and between scale states separately, so that k Tπi−1 ,πi ≡ p(πi |πi−1 ) = p(φi |φi−1 )pk (τi |τi−1 ). The constraint that time must move forward, cannot stand still, and that it can jump ahead no more than Jτ time states is enforced. (In our experiments we used Jτ = 3.) As well, we only allow scale state transitions between neighbouring scale states so that the local scale cannot jump arbitrarily. These constraints keep the number of legal transitions to a tractable computational size and work well in practice. Each observed time series has its own time transition probability distribution to account for experiment-speciﬁc patterns. Both the time and scale transition probability distributions are given by multinomials:  dk , if a − b = 1  1  k  d2 , if a − b = 2   k . p (τi = a|τi−1 = b) = . .  k d , if a − b = J  τ  Jτ  0, otherwise p(φi = a|φi−1  s0 , if D(a, b) = 0   s1 , if D(a, b) = 1 = b) =  s1 , if D(a, b) = −1  0, otherwise where D(a, b) = 1 means that a is one scale state larger than b, and D(a, b) = −1 means that a is one scale state smaller than b, and D(a, b) = 0 means that a = b. The distributions Jτ are constrained by: i=1 dk = 1 and 2s1 + s0 = 1. i Jτ determines the maximum allowable instantaneous speedup of one portion of a time series relative to another portion, within the same series or across different series. However, the length of time for which any series can move so rapidly is constrained by the length of the latent trace; thus the maximum overall ratio in speeds achievable by the model between any two entire time series is given by min(Jτ , M ). N After training, one may examine either the latent trace or the alignment of each observable time series to the latent trace. Such alignments can be achieved by several methods, including use of the Viterbi algorithm to ﬁnd the highest likelihood path through the hidden states [1], or sampling from the posterior over hidden state sequences. We found Viterbi alignments to work well in the experiments below; samples from the posterior looked quite similar. 3 Training with the Expectation-Maximization (EM) Algorithm As with HMMs, training with the EM algorithm (often referred to as Baum-Welch in the context of HMMs [1]), is a natural choice. In our model the E-Step is computed exactly using the Forward-Backward algorithm [1], which provides the posterior probability over k states for each time point of every observed time series, γs (i) ≡ p(πi = s|x) and also the pairwise state posteriors, ξs,t (i) ≡ p(πi−1 = s, πi = t|xk ). The algorithm is modiﬁed only in that the emission probabilities depend on the latent trace as described in Section 2. The M-Step consists of a series of analytical updates to the various parameters as detailed below. Given the latent trace (and the emission and state transition probabilities), the complete log likelihood of K observed time series, xk , is given by Lp ≡ L + P. L is the likelihood term arising in a (conditional) HMM model, and can be obtained from the Forward-Backward algorithm. It is composed of the emission and state transition terms. P is the log prior (or penalty term), regularizing various aspects of the model parameters as explained below. These two terms are: K N N L≡ log Aπi (xk |z) + i log p(π1 ) + τ −1 K (zj+1 − zj )2 + P ≡ −λ (1) i=2 i=1 k=1 k log Tπi−1 ,πi j=1 k log D(dk |{ηv }) + log D(sv |{ηv }), v (2) k=1 where p(π1 ) are priors over the initial states. The ﬁrst term in Equation 2 is a smoothing k penalty on the latent trace, with λ controlling the amount of smoothing. ηv and ηv are Dirichlet hyperprior parameters for the time and scale state transition probability distributions respectively. These ensure that all non-zero transition probabilities remain non-zero. k For the time state transitions, v ∈ {1, Jτ } and ηv corresponds to the pseudo-count data for k the parameters d1 , d2 . . . dJτ . For the scale state transitions, v ∈ {0, 1} and ηv corresponds to the pseudo-count data for the parameters s0 and s1 . Letting S be the total number of possible states, that is, the number of elements in the cross-product of possible time states and possible scale states, the expected complete log likelihood is: K S K p k k γs (1) log T0,s

5 0.054012127 23 nips-2004-Analysis of a greedy active learning strategy

Author: Sanjoy Dasgupta

Abstract: We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels. 1

6 0.049436647 49 nips-2004-Density Level Detection is Classification

7 0.046293646 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data

8 0.043210715 87 nips-2004-Integrating Topics and Syntax

9 0.039639626 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces

10 0.03754596 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's

11 0.034593057 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

12 0.033069771 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

13 0.032032561 28 nips-2004-Bayesian inference in spiking neurons

14 0.030950049 166 nips-2004-Semi-supervised Learning via Gaussian Processes

15 0.030820342 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

16 0.029987935 92 nips-2004-Kernel Methods for Implicit Surface Modeling

17 0.029868191 74 nips-2004-Harmonising Chorales by Probabilistic Inference

18 0.029363537 102 nips-2004-Learning first-order Markov models for control

19 0.029300045 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale

20 0.02913205 176 nips-2004-Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.085), (1, -0.012), (2, 0.021), (3, -0.038), (4, -0.001), (5, 0.048), (6, 0.007), (7, 0.053), (8, 0.026), (9, -0.064), (10, -0.015), (11, -0.006), (12, -0.006), (13, 0.04), (14, 0.032), (15, -0.07), (16, 0.009), (17, -0.063), (18, -0.019), (19, -0.027), (20, -0.093), (21, 0.052), (22, -0.119), (23, 0.035), (24, 0.029), (25, -0.033), (26, -0.049), (27, -0.088), (28, 0.033), (29, 0.02), (30, -0.143), (31, 0.098), (32, -0.069), (33, -0.102), (34, 0.062), (35, -0.049), (36, 0.064), (37, 0.133), (38, -0.076), (39, 0.201), (40, -0.049), (41, 0.093), (42, -0.055), (43, -0.071), (44, 0.072), (45, -0.12), (46, -0.109), (47, -0.032), (48, 0.212), (49, -0.12)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9716484 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann

2 0.63057363 52 nips-2004-Discrete profile alignment via constrained information bottleneck

Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

3 0.61208206 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps

Author: Frank Dimaio, George Phillips, Jude W. Shavlik

4 0.54352182 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi

Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We ﬁrst build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efﬁciently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1

5 0.47683561 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale

Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag

Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high afﬁnity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we ﬁnd that 2/3 of the predicted active motifs correspond to actual interaction sites. 1

6 0.39564773 74 nips-2004-Harmonising Chorales by Probabilistic Inference

7 0.38845733 124 nips-2004-Multiple Alignment of Continuous Time Series

8 0.33356476 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's

9 0.31046671 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity

10 0.28789231 41 nips-2004-Comparing Beliefs, Surveys, and Random Walks

11 0.27730322 207 nips-2004-ℓ₀-norm Minimization for Basis Selection

12 0.26823273 166 nips-2004-Semi-supervised Learning via Gaussian Processes

13 0.26451063 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces

14 0.26312679 49 nips-2004-Density Level Detection is Classification

15 0.23673639 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis

16 0.23653547 109 nips-2004-Mass Meta-analysis in Talairach Space

17 0.23501603 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data

18 0.23411933 29 nips-2004-Beat Tracking the Graphical Model Way

19 0.21912126 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees

20 0.21534809 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(9, 0.01), (13, 0.066), (15, 0.082), (25, 0.016), (26, 0.036), (31, 0.018), (33, 0.129), (35, 0.012), (36, 0.475), (39, 0.015), (50, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8304413 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann

2 0.54374665 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

Author: Máté Lengyel, Peter Dayan

Abstract: Areas of the brain involved in various forms of memory exhibit patterns of neural activity quite unlike those in canonical computational models. We show how to use well-founded Bayesian probabilistic autoassociative recall to derive biologically reasonable neuronal dynamics in recurrently coupled models, together with appropriate values for parameters such as the membrane time constant and inhibition. We explicitly treat two cases. One arises from a standard Hebbian learning rule, and involves activity patterns that are coded by graded ﬁring rates. The other arises from a spike timing dependent learning rule, and involves patterns coded by the phase of spike times relative to a coherent local ﬁeld potential oscillation. Our model offers a new and more complete understanding of how neural dynamics may support autoassociation. 1

3 0.51460874 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes

Author: Dashan Gao, Nuno Vasconcelos

Abstract: Saliency mechanisms play an important role when visual recognition must be performed in cluttered scenes. We propose a computational deﬁnition of saliency that deviates from existing models by equating saliency to discrimination. In particular, the salient attributes of a given visual class are deﬁned as the features that enable best discrimination between that class and all other classes of recognition interest. It is shown that this deﬁnition leads to saliency algorithms of low complexity, that are scalable to large recognition problems, and is compatible with existing models of early biological vision. Experimental results demonstrating success in the context of challenging recognition problems are also presented. 1

4 0.35118327 21 nips-2004-An Information Maximization Model of Eye Movements

Author: Laura W. Renninger, James M. Coughlan, Preeti Verghese, Jitendra Malik

Abstract: We propose a sequential information maximization model as a general strategy for programming eye movements. The model reconstructs high-resolution visual information from a sequence of fixations, taking into account the fall-off in resolution from the fovea to the periphery. From this framework we get a simple rule for predicting fixation sequences: after each fixation, fixate next at the location that minimizes uncertainty (maximizes information) about the stimulus. By comparing our model performance to human eye movement data and to predictions from a saliency and random model, we demonstrate that our model is best at predicting fixation locations. Modeling additional biological constraints will improve the prediction of fixation sequences. Our results suggest that information maximization is a useful principle for programming eye movements. 1 In trod u ction Since the earliest recordings [1, 2], vision researchers have sought to understand the non-random yet idiosyncratic behavior of volitional eye movements. To do so, we must not only unravel the bottom-up visual processing involved in selecting a fixation location, but we must also disentangle the effects of top-down cognitive factors such as task and prior knowledge. Our ability to predict volitional eye movements provides a clear measure of our understanding of biological vision. One approach to predicting fixation locations is to propose that the eyes move to points that are “salient”. Salient regions can be found by looking for centersurround contrast in visual channels such as color, contrast and orientation, among others [3, 4]. Saliency has been shown to correlate with human fixation locations when observers “look around” an image [5, 6] but it is not clear if saliency alone can explain why some locations are chosen over others and in what order. Task as well as scene or object knowledge will play a role in constraining the fixation locations chosen [7]. Observations such as this led to the scanpath theory, which proposed that eye movement sequences are tightly linked to both the encoding and retrieval of specific object memories [8]. 1.1 Our Approach We propose that during natural, active vision, we center our fixation on the most informative points in an image in order to reduce our overall uncertainty about what we are looking at. This approach is intuitive and may be biologically plausible, as outlined by Lee & Yu [9]. The most informative point will depend on both the observer’s current knowledge of the stimulus and the task. The quality of the information gathered with each fixation will depend greatly on human visual resolution limits. This is the reason we must move our eyes in the first place, yet it is often ignored. A sequence of eye movements may then be understood within a framework of sequential information maximization. 2 Human eye movements We investigated how observers examine a novel shape when they must rely heavily on bottom-up stimulus information. Because eye movements will be affected by the task of the observer, we constructed a learn-discriminate paradigm. Observers are asked to carefully study a shape and then discriminate it from a highly similar one. 2.1 Stimuli and Design We use novel silhouettes to reduce the influence of object familiarity on the pattern of eye movements and to facilitate our computations of information in the model. Each silhouette subtends 12.5º to ensure that its entire shape cannot be characterized with a single fixation. During the learning phase, subjects first fixated a marker and then pressed a button to cue the appearance of the shape which appeared 10º to the left or right of fixation. Subjects maintained fixation for 300ms, allowing for a peripheral preview of the object. When the fixation marker disappeared, subjects were allowed to study the object for 1.2 seconds while their eye movements were recorded. During the discrimination phase, subjects were asked to select the shape they had just studied from a highly similar shape pair (Figure 1). Performance was near 75% correct, indicating that the task was challenging yet feasible. Subjects saw 140 shapes and given auditory feedback. release fixation, view object freely (1200ms) maintain fixation (300ms) Which shape is a match? fixate, initiate trial Figure 1. Temporal layout of a trial during the learning phase (left). Discrimination of learned shape from a highly similar one (right). 2.2 Apparatus Right eye position was measured with an SRI Dual Purkinje Image eye tracker while subjects viewed the stimulus binocularly. Head position was fixed with a bitebar. A 25 dot grid that covered the extent of the presentation field was used for calibration. The points were measured one at a time with each dot being displayed for 500ms. The stimuli were presented using the Psychtoolbox software [10]. 3 Model We wish to create a model that builds a representation of a shape silhouette given imperfect visual information, and which updates its representation as new visual information is acquired. The model will be defined statistically so as to explicitly encode uncertainty about the current knowledge of the shape silhouette. We will use this model to generate a simple rule for predicting fixation sequences: after each fixation, fixate next at the location that will decrease the model’s uncertainty as much as possible. Similar approaches have been described in an ideal observer model for reading [11], an information maximization algorithm for tracking contours in cluttered images [12] and predicting fixation locations during object learning [13]. 3.1 Representing information The information in silhouettes clearly resides at its contour, which we represent with a collection of points and associated tangent orientations. These points and their associated orientations are called edgelets, denoted e1, e2, ... eN, where N is the total number of edgelets along the boundary. Each edgelet ei is defined as a triple ei=(xi, yi, zi) where (xi, yi) is the 2D location of the edgelet and zi is the orientation of the tangent to the boundary contour at that point. zi can assume any of Q possible values 1, 2, …, Q, representing a discretization of Q possible orientations ranging from 0 to π , and we have chosen Q=8 in our experiments. The goal of the model is to infer the most likely orientation values given the visual information provided by one or more fixations. 3.2 Updating knowledge The visual information is based on indirect measurements of the true edgelet values e1, e2, ... eN. Although our model assumes complete knowledge of the number N and locations (xi, yi) of the edgelets, it does not have direct access to the orientations zi.1 Orientation information is instead derived from measurements that summarize the local frequency of occurrence of edgelet orientations, averaged locally over a coarse scale (corresponding to the spatial scale at which resolution is limited by the human visual system). These coarse measurements provide indirect information about individual edgelet orientations, which may not uniquely determine the orientations. We will use a simple statistical model to estimate the distribution of individual orientation values conditioned on this information. Our measurements are defined to model the resolution limitations of the human visual system, with highest resolution at the fovea and lower resolution in the 1 Although the visual system does not have precise knowledge of location coordinates, the model is greatly simplified by assuming this knowledge. It is reasonable to expect that location uncertainty will be highly correlated with orientation uncertainty, so that the inclusion of location should not greatly affect the model's decisions of where to fixate next. periphery. Distance to the fovea is r measured as eccentricity E, the visual angle between any point and the fovea. If x = ( x, y ) is the location of a point in an image r and f = ( f x , f y ) is the fixation (i.e. foveal) location in the image then the r r eccentricity is E = x − f , measured in units of visual degrees. The effective resolution of orientation discrimination falls with increasing eccentricity as r (E ) = FPH ( E + E 2 ) where r(E) is an effective radius over which the visual system spatially pools information and FPH =0.1 and E2=0.8 [14]. Our model represents pooled information as a histogram of edge orientations within the effective radius. For each edgelet ei we define the histogram of all edgelet r orientations ej within radius ri = r(E) of ei , where E is the eccentricity of xi = ( xi , yi ) r r r relative to the current fixation f , i.e. E = xi − f . To define the histogram more precisely we will introduce the neighborhood set Ni of all indices j corresponding to r r edgelets within radius ri of ei : N i = all j s.t. xi − x j ≤ ri , with number of { } neighborhood edgelets |Ni|. The (normalized) histogram centered at edgelet ei is then defined as hiz = 1 Ni ∑δ j∈N i z,z j , which is the proportion of edgelet orientations that assume value z in the (eccentricity-dependent) neighborhood of edgelet ei.2 Figure 2. Relation between eccentricity E and radius r(E) of the neighborhood (disk) which defines the local orientation histogram (hiz ). Left and right panels show two fixations for the same object. Up to this point we have restricted ourselves to the case of a single fixation. To designate a sequence of multiple fixations we will index them byrk=1, 2, …, K (for K total fixations). The k th fixation location is denoted by f ( k ) = ( f xk , f yk ) . The quantities ri , Ni and hiz depend on fixation location and so to make this dependence (k explicit we will augment them with superscripts as ri(k ) , N i(k ) , and hiz ) . 2 δ x, y is the Kronecker delta function, defined to equal 1 if x = y and 0 if x ≠ y . Now we describe the statistical model of edgelet orientations given information obtained from multiple fixations. Ideally we would like to model the exact distribution of orientations conditioned on the histogram data: (1) ( 2) (K ) ( , where {hizk ) } represents all histogram P(zi , z 2 , ... z N | {hiz }, {hiz },K, {hiz }) r components z at every edgelet ei for fixation f (k ) . This exact distribution is intractable, so we will use a simple approximation. We assume the distribution factors over individual edgelets: N ( ( ( P(zi , z 2 , ... z N | {hiz1) }, {hiz2 ) },K, {hizK ) }) = ∏ g i(zi ) i =1 where gi(zi) is the marginal distribution of orientation zi. Determining these marginal distributions is still difficult even with the factorization assumption, so we K will make an additional approximation: g (z ) = 1 ∏ hiz( k ) , where Zi is a suitable i i Z i k =1 (k ) normalization factor. This approximation corresponds to treating hiz as a likelihood function over z, with independent likelihoods for each fixation k. While the approximation has some undesirable properties (such as making the marginal distribution gi(zi) more peaked if the same fixation is made repeatedly), it provides a simple mechanism for combining histogram evidence from multiple, distinct fixations. 3.3 Selecting the next fixation r ( K +1) Given the past K fixations, the next fixation f is chosen to minimize the model r ( K +1) entropy of the edgelet orientations. In other words, f is chosen to minimize r ( K +1) ( ( ( H( f ) = entropy[ P(zi , z2 , ... z N | {hiz1) }, {hiz2 ) },K, {hizK +1) })] , where the entropy of a distribution P(x) is defined as − ∑ P( x) log P ( x) . In practice, we minimize the x r entropy by evaluating it across a set of candidate locations f ( K +1) which forms a regularly sampled grid across the image.3 We note that this selection rule makes decisions that depend, in general, on the full history of previous K fixations. 4 Results Figure 3 shows an example of one observer’s eye movements superimposed over the shape (top row), the prediction from a saliency model (middle row) [3] and the prediction from the information maximization model (bottom row). The information maximization model updates its prediction after each fixation. An ideal sequence of fixations can be generated by both models. The saliency model selects fixations in order of decreasing salience. The information maximization model selects the maximally informative point after incorporating information from the previous fixations. To provide an additional benchmark, we also implemented a 3 This rule evaluates the entropy resulting from every possible next fixation before making a decision. Although this rule is suitable for our modeling purposes, it would be inefficient to implement in a biological or machine vision system. A practical decision rule would use current knowledge to estimate the expected (rather than actual) entropy. Figure 3. Example eye movement pattern, superimposed over the stimulus (top row), saliency map (middle row) and information maximization map (bottom row). model that selects fixations at random. One way to quantify the performance is to map a subject’s fixations onto the closest model predicted fixation locations, ignoring the sequence in which they were made. In this analysis, both the saliency and information maximization models are significantly better than random at predicting candidate locations (p < 0.05; t-test) for three observers (Figure 4, left). The information maximization model performs slightly but significantly better than the saliency model for two observers (lm, kr). If we match fixation locations while retaining the sequence, errors become quite large, indicating that the models cannot account for the observed behavior (Figure 4, right). Sequence Error Visual Angle (deg) Location Error R S I R S I R S I R S I R S I R S I Figure 4. Prediction error of three models: random (R), saliency (S) and information maximization (I) for three observers (pv, lm, kr). The left panel shows the error in predicting fixation locations, ignoring sequence. The right panel shows the error when sequence is retained before mapping. Error bars are 95% confidence intervals. The information maximization model incorporates resolution limitations, but there are further biological constraints that must be considered if we are to build a model that can fully explain human eye movement patterns. First, saccade amplitudes are typically around 2-4º and rarely exceed 15º [15]. When we move our eyes, the image of the visual world is smeared across the retina and our perception of it is actively suppressed [16]. Shorter saccade lengths may be a mechanism to reduce this cost. This biological constraint would cause a fixation to fall short of the prediction if it is distant from the current fixation (Figure 5). Figure 5. Cost of moving the eyes. Successive fixations may fall short of the maximally salient or informative point if it is very distant from the current fixation. Second, the biological system may increase its sampling efficiency by planning a series of saccades concurrently [17, 18]. Several fixations may therefore be made before sampled information begins to influence target selection. The information maximization model currently updates after each fixation. This would create a discrepancy in the prediction of the eye movement sequence (Figure 6). Figure 6. Three fixations are made to a location that is initially highly informative according to the information maximization model. By the fourth fixation, the subject finally moves to the next most informative point. 5 D i s c u s s i on Our model and the saliency model are using the same image information to determine fixation locations, thus it is not surprising that they are roughly similar in their performance of predicting human fixation locations. The main difference is how we decide to “shift attention” or program the sequence of eye movements to these locations. The saliency model uses a winner-take-all and inhibition-of-return mechanism to shift among the salient regions. We take a completely different approach by saying that observers adopt a strategy of sequential information maximization. In effect, the history of where we have been matters because our model is continually collecting information from the stimulus. We have an implicit “inhibition-of-return” because there is little to be gained by revisiting a point. Second, we attempt to take biological resolution limits into account when determining the quality of information gained with each fixation. By including additional biological constraints such as the cost of making large saccades and the natural time course of information update, we may be able to improve our prediction of eye movement sequences. We have shown that the programming of eye movements can be understood within a framework of sequential information maximization. This framework is portable to any image or task. A remaining challenge is to understand how different tasks constrain the representation of information and to what degree observers are able to utilize the information. Acknowledgments Smith-Kettlewell Eye Research Institute, NIH Ruth L. Kirschstein NRSA, ONR #N0001401-1-0890, NSF #IIS0415310, NIDRR #H133G030080, NASA #NAG 9-1461. References [1] Buswell (1935). How people look at pictures. Chicago: The University of Chicago Press. [2] Yarbus (1967). Eye movements and vision. New York: Plenum Press. [3] Itti & Koch (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489-1506. [4] Kadir & Brady (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83-105. [5] Parkhurst, Law, and Niebur (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107-123. [6] Nothdurft (2002). Attention shifts to salient targets. Vision Research, 42, 1287-1306. [7] Oliva, Torralba, Castelhano & Henderson (2003). Top-down control of visual attention in object detection. Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain. [8] Noton & Stark (1971). Scanpaths in eye movements during pattern perception. Science, 171, 308-311. [9] Lee & Yu (2000). An information-theoretic framework for understanding saccadic behaviors. Advanced in Neural Processing Systems, 12, 834-840. [10] Brainard (1997). The psychophysics toolbox. Spatial Vision, 10 (4), 433-436. [11] Legge, Hooven, Klitz, Mansfield & Tjan (2002). Mr.Chips 2002: new insights from an ideal-observer model of reading. Vision Research, 42, 2219-2234. [12] Geman & Jedynak (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Analysis and Machine Intel, 18(1), 1-14. [13] Renninger & Malik (2004). Sequential information maximization can explain eye movements in an object learning task. Journal of Vision, 4(8), 744a. [14] Levi, Klein & Aitesbaomo (1985). Vernier acuity, crowding and cortical magnification. Vision Research, 25(7), 963-977. [15] Bahill, Adler & Stark (1975). Most naturally occurring human saccades have magnitudes of 15 degrees or less. Investigative Ophthalmology, 14, 468-469. [16] Burr, Morrone & Ross (1994). Selective suppression of the magnocellular visual pathway during saccadic eye movements. Nature, 371, 511-513. [17] Caspi, Beutter & Eckstein (2004). The time course of visual information accrual guiding eye movement decisions. Proceedings of the Nat’l Academy of Science, 101(35), 13086-90. [18] McPeek, Skavenski & Nakayama (2000). Concurrent processing of saccades in visual search. Vision Research, 40, 2499-2516.

5 0.34519216 52 nips-2004-Discrete profile alignment via constrained information bottleneck

Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

6 0.34046844 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

7 0.3397761 102 nips-2004-Learning first-order Markov models for control

8 0.33900625 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

9 0.33886516 131 nips-2004-Non-Local Manifold Tangent Learning

10 0.33868933 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

11 0.33863035 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

12 0.33845988 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

13 0.33822057 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

14 0.33700821 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

15 0.33697596 124 nips-2004-Multiple Alignment of Continuous Time Series

16 0.33691069 207 nips-2004-ℓ₀-norm Minimization for Basis Selection

17 0.33671066 163 nips-2004-Semi-parametric Exponential Family PCA

18 0.33662692 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

19 0.3366209 127 nips-2004-Neighbourhood Components Analysis

20 0.33644375 178 nips-2004-Support Vector Classification with Input Data Uncertainty