nips nips2006 nips2006-108 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gunnar Rätsch, Sören Sonnenburg
Abstract: We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology. 1
Reference: text
sentIndex sentText sentNum sentScore
1 This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. [sent-10, score-0.218]
2 The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. [sent-12, score-0.253]
3 We have tested our algorithm for predicting gene structures, an important problem in computational biology. [sent-13, score-0.142]
4 1 Introduction Hidden Markov SVMs are a recently-proposed method for predicting a label sequence given the input sequence [3, 17, 18, 1, 2]. [sent-15, score-0.255]
5 They combine the benefits of the power and flexibility of kernel methods with the idea of Hidden Markov Models (HMM) [11] to predict label sequences. [sent-16, score-0.114]
6 During this segment of time the system’s behavior is allowed to be non-Markovian. [sent-20, score-0.237]
7 This adds flexibility for instance to model segment lengths or to use non-linear content sensors that may depend on the start and end of the segment. [sent-21, score-0.734]
8 In the second part of the paper we consider the case of using content sensors (for whole segments) and signal detectors (at segment boundaries) in SHM SVMs. [sent-24, score-0.775]
9 This strategy allows us to tackle significantly larger label sequence problems (with several thousands of sequences). [sent-27, score-0.217]
10 To illustrate the strength of our approach we have applied our algorithm to an important problem in computational biology: the prediction of the segmentation of a pre-mRNA sequence into exons and introns. [sent-28, score-0.221]
11 On problems derived from sequences of the model organism Caenorhabditis elegans we can show that the SHM SVM approach consistently outperforms HMM based approaches by a large margin (see also [13]). [sent-29, score-0.421]
12 Finally, in Section 4 we outline the gene structure prediction problem, discuss additional techniques to apply SHM SVMs to this problem and show surprisingly large improvements compared to state-of-the-art methods. [sent-32, score-0.221]
13 de/raetsch 2 Hidden Markov SVMs In label sequence learning one learns a function that assigns to a sequence of objects x = χ1 χ2 . [sent-36, score-0.255]
14 1 A common approach is to determine a discriminant function F : X × Y → R that assigns a score to every input x ∈ X := X ∗ and every label sequence y ∈ Y := Υ∗ , where X ∗ denotes the Kleene closure of X. [sent-49, score-0.207]
15 , N , the parameters are tuned such that the true labeling y n scores higher than all other labelings y ∈ Yn := Y \ y n with a large margin, i. [sent-56, score-0.233]
16 One solves (2) for the smaller problem and then identifies labelings y ∈ Yn that maximally violate constraints, i. [sent-79, score-0.154]
17 However, since the computation of F involves many kernel computations and also the number of non-zero α’s is often large, solving the problem with more than a few hundred labeled sequences often seems computationally too expensive. [sent-84, score-0.261]
18 If F (x, ·) satisfies certain conditions, one can use a Viterbi-like algorithm [20] for efficient decoding 1 Note that the number of possible labelings grows exponentially in the length of the sequence. [sent-87, score-0.315]
19 This is particularly the case when Φ can be written as a sum over the length of the sequence and decomposed as l(x) Φ(x, y) = Φσ,τ (υi , υi+1 , x, i) i=1 σ,τ ∈Υ 2 where l(x) is the length of the sequence x. [sent-89, score-0.312]
20 Using this decomposition we can define V (i, υ) := max(V (i − 1, υ ) + g(υ , υ, x, i − 1)) i > 1 υ ∈Υ 0 otherwise as the maximal score for all labelings with label υ at position i. [sent-100, score-0.317]
21 3 The above decoding algorithm requires to evaluate g at most |Υ|2 l(x) times. [sent-103, score-0.094]
22 Since computing g involves computing potentially large sums of kernel functions, the decoding step can be computationally quite demanding–depending on the kernels and the number of examples. [sent-104, score-0.131]
23 4 Extension to Hidden Semi-Markov SVMs Semi-Markov models extend hidden Markov models by allowing each state to persist for a nonunit number δi of symbols. [sent-106, score-0.201]
24 In this work we extend Hidden Markov-SVMs to Hidden Semi-Markov SVMs by considering sequences of segments instead of simple label sequences. [sent-112, score-0.452]
25 , (υs , πs ), where πj is the start position of the segment and υj its label. [sent-116, score-0.344]
26 To simplify the notation we define πs+1 := l(x) + 1, s := s(y) to be the number of segments in y and υs+1 := ∅. [sent-118, score-0.191]
27 Note that one can extend the outlined decoding algorithm to produce not only the best path, but the K best paths. [sent-121, score-0.165]
28 V (i, υ, k) now is the k-best score of labelings with label υ at position i. [sent-127, score-0.317]
29 4 For simplicity, we associate the label of a segment with the signal at the boundary to the next segment. [sent-128, score-0.375]
30 With this definition we can extract features from segments: As πj and πj+1 are given one can for instance compute the length of the segment or other features that depend on the start and the end of the segment. [sent-130, score-0.409]
31 (5) j=1 σ,τ ∈Υ =:g(υj ,υj+1 ,x,πj ,πj+1 ) Analogously we can extend the formula for the Viterbi-like decoding algorithm [14]: max (V (i − d, υ ) + g(υ , υ, x, i − d, i)) i > 1 υ ∈Υ,d=1,. [sent-132, score-0.127]
32 ,min(i−1,S) V (i, υ) := 0 otherwise (6) where S is the maximal segment length and maxυ∈Υ V (l(x), υ) is the score of the best segment labeling. [sent-135, score-0.582]
33 The optimal label sequence can be obtained as before by backtracking. [sent-137, score-0.166]
34 Also the above method can be easily extended to produce the K best labelings (cf. [sent-138, score-0.154]
35 The idea is that the feature map should contain information about segments such as the length or the content as well as segment boundaries, which may exhibit certain detectable signals. [sent-142, score-0.788]
36 χπj+1 −2 χπj+1 −1 for extracting content information about segment j. [sent-148, score-0.469]
37 Also, for considering signals we assume it to be sufficient to consider a window ±ω around the end of the segment, i. [sent-149, score-0.123]
38 To keep the notation simple we do not consider signals at the start of the segment. [sent-155, score-0.111]
39 Then the kernel between two examples using this feature map can be written as: ks (χπj+1 ±ω , χπ ±ω ) )+ kc (χπj . [sent-163, score-0.28]
40 π k((x, y), (x , y )) = j σ,τ ∈Υ j:(υj ,υj )=(σ,τ ) j :(υj ,υj )=(σ,τ ) j +1 j +1 τ ∈Υ j:υj+1 =τ j :υj +1 =τ where kc (·, ·) := Φ1 (·), Φ1 (·) and ks (·, ·) := Φs (·), Φs (·) . [sent-167, score-0.182]
41 The above formulation has the benefit of keeping the signals and content kernels separated for each label, which we can exploit for rewriting F (x, y) F (x, y) = Fσ,τ (χπj . [sent-168, score-0.281]
42 πj+1 ) σ,τ ∈Υ j:(υj ,υj+1 )=(σ,τ ) where Fτ (χπj+1 ±ω ), + τ ∈Υ j:υj+1 =τ N Fσ,τ (χ) := kc (χ, χn π αn (y ) n=1 y ∈Y and j j :(υj ,υj . [sent-170, score-0.096]
43 +1 =τ Hence, we have partitioned F (x, y) into |Υ|2 + |Υ| functions characterizing the content and the signals. [sent-173, score-0.265]
44 2 Two-Stage Learning By enumerating all non-zero α’s and valid settings of j in Fτ and Fσ,τ , we can define sets of sequences {χτ,σ }m=1,. [sent-175, score-0.151]
45 Hence, Fτ and Fσ,τ can be rewritten as a (single-sum) linear combiπ M M σ,τ τ τ τ σ,τ τ,σ nation of kernels: Fσ,τ (χ) := m=1 αm kc (χ, χm ) and Fτ (χ) := m=1 αm ks (χ, χm ) for τ appropriately chosen α’s. [sent-184, score-0.219]
46 For sequences χm that do not correspond to true segment boundaries, τ the coefficient αm is either negative or zero (since wrong segment boundaries can only appear in wrong labelings y = y n and αn (y) ≤ 0). [sent-185, score-0.976]
47 True segment boundaries in correct label sequences have τ non-negative αm ’s. [sent-186, score-0.578]
48 Hence, we may interpret these functions as m SVM classification functions recognizing segments and boundaries of all kinds. [sent-188, score-0.37]
49 In this work we propose to separate the learning of the content sensors and signal detectors from learning how they have to act together ¯ ¯ in order to produce the correct labeling. [sent-190, score-0.538]
50 The idea is to train SVM-based classifiers Fσ,τ and Fτ using the kernels kc and ks on examples with known labeling. [sent-191, score-0.209]
51 For every segment type and segment boundary we generate a set of positive examples from observed segments and boundaries. [sent-192, score-0.665]
52 As negative examples we use all boundaries and segments that were not observed in a true labeling. [sent-193, score-0.304]
53 This leads to a set of sequences that may potentially also appear in the expansions of Fσ,τ and Fτ . [sent-194, score-0.151]
54 However, ¯ ¯ while the functions Fσ,τ and Fτ might recognize the same contents and signals as Fσ,τ and Fτ , the functions are obtained independently from each other and might not be scaled correctly to jointly produce the correct labeling. [sent-197, score-0.161]
55 The transformation functions t : R → R are one-dimensional mappings and it seems fully sufficient to use for instance piece-wise linear functions (PLiFs) pµ,θ (λ) := ϕµ (λ), θ with fixed abscissa boundaries µ and θ-parametrized ordinate values (ϕµ (λ) can be appropriately defined). [sent-199, score-0.216]
56 If the alphabet Υ is reasonably small then the dimensionality is low enough to solve the optimization problem (2) efficiently in the primal domain. [sent-204, score-0.097]
57 In the next section we will illustrate how to successfully apply a version of the outlined algorithm to a problem where we have several thousands of relatively long labeled sequences. [sent-205, score-0.089]
58 4 Application to Gene Structure Prediction The problem of gene structure prediction is to segment nucleotide sequences (so-called pre-mRNA sequences generated by transcription; cf. [sent-206, score-0.763]
59 In a complex biochemical process called splicing the introns are removed from the pre-mRNA sequence to form the mature mRNA sequence that can be translated into protein. [sent-208, score-0.223]
60 The exon-intron and intron-exon boundaries are defined by sequence motifs almost always containing the letters GT and AG (cf. [sent-209, score-0.202]
61 However, these dimers appear very frequently and one needs sophisticated methods to recognize true splice sites [21, 12, 13]. [sent-211, score-0.312]
62 So far mostly HMM-based methods such as Genscan [5], Snap [8] or ExonHunter [4] have been applied to this problem and also to the more difficult problem of gene finding. [sent-212, score-0.142]
63 Figure 2 illustrates the “grammar” that we use for gene structure prediction. [sent-215, score-0.142]
64 We only require four different states (start, exon-end, exon-start and end) and two different segment labels (exon & intron). [sent-216, score-0.237]
65 Each of these exon types correspond to one transition in the model. [sent-218, score-0.327]
66 States two and three recognize the two types of splice sites and the transition between these states defines an intron. [sent-219, score-0.35]
67 For our specific problem we only need signal detectors for segments ending in state two and three. [sent-220, score-0.415]
68 Additionally we need content sensors for every possible transition. [sent-222, score-0.35]
69 While the “content” of the different exon segments is essentially the same, the length of them can vary quite drastically. [sent-223, score-0.547]
70 We therefore decided to use one content sensor ¯ ¯ FI for the intron transition 2 → 3 and the same content sensor FE for all four exon transitions 1 → 2, 1 → 4, 3 → 2 and 3 → 4. [sent-224, score-1.171]
71 However, in order to capture the different length characteristics, we include s(y) [[υj = σ]][[υj+1 = τ ]]ϕγ σ,τ (πj+1 − πj ) j=1 (8) σ,τ ∈Υ in the feature map (7), which amounts to using PLiFs for the lengths of all transitions. [sent-225, score-0.17]
72 5 We have obtained data for training, validation and testing from public sequence databases (see [13] for details). [sent-230, score-0.089]
73 elegans training set used for label-sequence learning contains 1,536 sequences with an average length of ≈ 2, 300 base pairs and about 9 segments per sequence, and the splice site signal detectors where trained on more than a million examples. [sent-234, score-1.048]
74 In principle it is possible to join sets 1 & 2, however, ¯ ¯ then the predictions of Fσ,τ and Fτ on the sequences used for the HSM SVM are skewed in the margin area (since the examples are pushed away from the decision boundary on the training set). [sent-235, score-0.231]
75 1 Learning the Splice Site Signal Detectors From the training sequences (Set 1) we extracted sequences of confirmed splice sites (intron start and end). [sent-238, score-0.666]
76 For intron start sites we used a window of [−80, +60] around the site. [sent-239, score-0.453]
77 From the training sequences we also extracted non-splice sites, which are within an exon or intron of the sequence and have AG or GT consensus. [sent-241, score-0.814]
78 We train an SVM [19] with softd l−j margin using the WD kernel [12]: k(x, x ) = j=1 βj i=1 [[(x[i,i+j] = x[i,i+j] )]], where l = 140 is the length of the sequence and x[a,b] denotes the sub-string of x from position a to (excluding) b ˜ . [sent-242, score-0.309]
79 elegans resulted in 79,000 and 61,233 support vectors for detecting intron start and end sites, respectively. [sent-247, score-0.535]
80 A transcript of a gene starts with an exon and may then be interrupted by an intron, followed by another exon, intron and so on until it ends in an exon. [sent-250, score-0.68]
81 Figure 2: An elementary state model for unspliced mRNA: The start is either directly followed by the end or by an arbitrary number of donor-acceptor splice site pairs. [sent-252, score-0.339]
82 2 Learning the Exon and Intron Content Sensors To obtain the exon content sensor we derived a set of exons from the training set. [sent-254, score-0.683]
83 As negative examples we used sub-sequences of intronic sequences sampled such that both sets of strings have roughly the same length distribution. [sent-255, score-0.218]
84 ¯ ¯ Note that the resulting content sensors FI and FE need to be evaluated several times during the Viterbi-like algorithm (cf. [sent-261, score-0.35]
85 (6)): One needs to extend segments ending at the same position i to several different starting points. [sent-262, score-0.305]
86 3 Combination ¯ ¯ For datasets 2-4 we can precompute all candidate splice sites using the classifiers F2 and F3 . [sent-265, score-0.266]
87 We ¯ ¯ ¯ decided to use PLiFs with P = 30 support points and chose the boundaries for F2 , F3 , FE , and ¯ FI uniformly between −5 and 5 (typical range of outputs of our SVMs). [sent-266, score-0.144]
88 For the PLiFs concerned with length of segments we chose appropriate boundaries in the range 30 − 1000. [sent-267, score-0.371]
89 6 Having defined the feature map and the regularizer, we can now apply the HSM SVM algorithm outlined in Sections 2. [sent-271, score-0.099]
90 Since the feature space is rather low dimensional (270 dimensions), we can solve the optimization problem in the primal domain even with several thousands of examples employing a standard optimizer (we used ILOG CPLEX and column generation) within a reasonable time. [sent-273, score-0.18]
91 elegans we can compare it to ExonHunter8 on 1177 test sequences. [sent-277, score-0.181]
92 Simplifying the problem by only considering sequences between the start and stop codons allows us to also include SNAP in the comparison on the dataset 4’, a slightly modified version of dataset 4 with 1138 sequences. [sent-280, score-0.273]
93 Moreover, we have successfully applied our method on large scale gene structure 6 This implements our intuition that large SVM scores should lead to larger scores for a labeling. [sent-285, score-0.202]
94 It takes less than one hour to solve the HSM SVM problem with about 1,500 sequences on a single CPU. [sent-286, score-0.187]
95 Training the content and signal detectors on several hundred thousand examples takes around 5 hours in total. [sent-287, score-0.466]
96 9% Table 1: Shown are the rates of predicting a wrong gene structure, sensitivity (Sn) and specificity (Sp) on exon and nucleotide levels (see e. [sent-317, score-0.514]
97 Prediction of complete gene structures in human genomic DNA. [sent-377, score-0.142]
98 A generalized hidden markov model for the recognition of human genes in DNA. [sent-399, score-0.186]
99 A tutorial on hidden markov models and selected applications in speech recognition. [sent-408, score-0.186]
100 Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. [sent-476, score-0.094]
wordName wordTfidf (topN-words)
[('exon', 0.289), ('svms', 0.256), ('intron', 0.249), ('segment', 0.237), ('content', 0.232), ('segments', 0.191), ('hsm', 0.182), ('plifs', 0.182), ('shm', 0.182), ('exonhunter', 0.181), ('elegans', 0.181), ('splice', 0.155), ('labelings', 0.154), ('sequences', 0.151), ('gene', 0.142), ('detectors', 0.127), ('hidden', 0.123), ('sensors', 0.118), ('boundaries', 0.113), ('sites', 0.111), ('hm', 0.103), ('kc', 0.096), ('decoding', 0.094), ('exons', 0.091), ('snap', 0.091), ('sequence', 0.089), ('ks', 0.086), ('svm', 0.082), ('site', 0.079), ('fe', 0.078), ('label', 0.077), ('sp', 0.068), ('length', 0.067), ('markov', 0.063), ('start', 0.062), ('signal', 0.061), ('yn', 0.058), ('hmms', 0.054), ('plif', 0.052), ('xn', 0.052), ('altun', 0.052), ('thousands', 0.051), ('signals', 0.049), ('labeling', 0.049), ('tsch', 0.048), ('hundred', 0.046), ('analogously', 0.046), ('recognize', 0.046), ('ismb', 0.045), ('organism', 0.045), ('semimarkov', 0.045), ('splicing', 0.045), ('persist', 0.045), ('caenorhabditis', 0.045), ('wp', 0.045), ('position', 0.045), ('margin', 0.044), ('end', 0.043), ('lengths', 0.042), ('wrong', 0.042), ('mrna', 0.041), ('nucleotide', 0.041), ('score', 0.041), ('prediction', 0.041), ('genome', 0.04), ('biology', 0.04), ('sn', 0.039), ('sonnenburg', 0.038), ('outline', 0.038), ('outlined', 0.038), ('transition', 0.038), ('appropriately', 0.037), ('kernel', 0.037), ('ending', 0.036), ('training', 0.036), ('solve', 0.036), ('sensor', 0.035), ('regularizer', 0.034), ('functions', 0.033), ('optimization', 0.033), ('extend', 0.033), ('nt', 0.033), ('tsochantaridis', 0.033), ('argmax', 0.033), ('ag', 0.033), ('extension', 0.032), ('feature', 0.032), ('window', 0.031), ('decided', 0.031), ('hofmann', 0.031), ('scores', 0.03), ('dataset', 0.03), ('transitions', 0.03), ('map', 0.029), ('fi', 0.028), ('ller', 0.028), ('primal', 0.028), ('gt', 0.028), ('train', 0.027), ('solving', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 108 nips-2006-Large Scale Hidden Semi-Markov SVMs
Author: Gunnar Rätsch, Sören Sonnenburg
Abstract: We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology. 1
2 0.22528115 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
Author: Jade P. Vinson, David Decaprio, Matthew D. Pearson, Stacey Luoma, James E. Galagan
Abstract: Computational gene prediction using generative models has reached a plateau, with several groups converging to a generalized hidden Markov model (GHMM) incorporating phylogenetic models of nucleotide sequence evolution. Further improvements in gene calling accuracy are likely to come through new methods that incorporate additional data, both comparative and species specific. Conditional Random Fields (CRFs), which directly model the conditional probability P (y|x) of a vector of hidden states conditioned on a set of observations, provide a unified framework for combining probabilistic and non-probabilistic information and have been shown to outperform HMMs on sequence labeling tasks in natural language processing. We describe the use of CRFs for comparative gene prediction. We implement a model that encapsulates both a phylogenetic-GHMM (our baseline comparative model) and additional non-probabilistic features. We tested our model on the genome sequence of the fungal human pathogen Cryptococcus neoformans. Our baseline comparative model displays accuracy comparable to the the best available gene prediction tool for this organism. Moreover, we show that discriminative training and the incorporation of non-probabilistic evidence significantly improve performance. Our software implementation, Conrad, is freely available with an open source license at http://www.broad.mit.edu/annotation/conrad/. 1
3 0.12359586 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
Author: Fei Sha, Lawrence K. Saul
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.
4 0.10111612 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity
Author: Gregory Shakhnarovich, Sung-phil Kim, Michael J. Black
Abstract: Neural motor prostheses (NMPs) require the accurate decoding of motor cortical population activity for the control of an artificial motor system. Previous work on cortical decoding for NMPs has focused on the recovery of hand kinematics. Human NMPs however may require the control of computer cursors or robotic devices with very different physical and dynamical properties. Here we show that the firing rates of cells in the primary motor cortex of non-human primates can be used to control the parameters of an artificial physical system exhibiting realistic dynamics. The model represents 2D hand motion in terms of a point mass connected to a system of idealized springs. The nonlinear spring coefficients are estimated from the firing rates of neurons in the motor cortex. We evaluate linear and a nonlinear decoding algorithms using neural recordings from two monkeys performing two different tasks. We found that the decoded spring coefficients produced accurate hand trajectories compared with state-of-the-art methods for direct decoding of hand kinematics. Furthermore, using a physically-based system produced decoded movements that were more “natural” in that their frequency spectrum more closely matched that of natural hand movements. 1
5 0.083578438 134 nips-2006-Modeling Human Motion Using Binary Latent Variables
Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis
Abstract: We propose a non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued “visible” variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes on-line inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. Website: http://www.cs.toronto.edu/∼gwtaylor/publications/nips2006mhmublv/
6 0.083289146 111 nips-2006-Learning Motion Style Synthesis from Perceptual Observations
7 0.083085395 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
8 0.075410895 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
9 0.074361123 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
10 0.073119715 81 nips-2006-Game Theoretic Algorithms for Protein-DNA binding
11 0.071406282 65 nips-2006-Denoising and Dimension Reduction in Feature Space
12 0.06966041 186 nips-2006-Support Vector Machines on a Budget
13 0.069650427 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
14 0.063698322 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
15 0.061010342 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
16 0.060165342 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
17 0.059870146 130 nips-2006-Max-margin classification of incomplete data
18 0.058324084 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
19 0.055641014 200 nips-2006-Unsupervised Regression with Applications to Nonlinear System Identification
20 0.05471601 57 nips-2006-Conditional mean field
topicId topicWeight
[(0, -0.205), (1, 0.033), (2, 0.039), (3, -0.047), (4, 0.007), (5, 0.032), (6, -0.04), (7, -0.029), (8, -0.001), (9, -0.003), (10, -0.23), (11, -0.009), (12, -0.194), (13, 0.051), (14, -0.049), (15, -0.042), (16, -0.007), (17, -0.077), (18, -0.039), (19, -0.095), (20, 0.135), (21, 0.027), (22, -0.115), (23, -0.104), (24, 0.095), (25, -0.08), (26, -0.118), (27, -0.034), (28, 0.037), (29, 0.15), (30, 0.038), (31, 0.057), (32, 0.029), (33, 0.058), (34, 0.053), (35, -0.066), (36, 0.06), (37, -0.036), (38, 0.051), (39, -0.044), (40, -0.095), (41, -0.049), (42, -0.076), (43, -0.068), (44, 0.041), (45, -0.105), (46, 0.068), (47, 0.074), (48, 0.058), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.93969941 108 nips-2006-Large Scale Hidden Semi-Markov SVMs
Author: Gunnar Rätsch, Sören Sonnenburg
Abstract: We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology. 1
2 0.73657191 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
Author: Jade P. Vinson, David Decaprio, Matthew D. Pearson, Stacey Luoma, James E. Galagan
Abstract: Computational gene prediction using generative models has reached a plateau, with several groups converging to a generalized hidden Markov model (GHMM) incorporating phylogenetic models of nucleotide sequence evolution. Further improvements in gene calling accuracy are likely to come through new methods that incorporate additional data, both comparative and species specific. Conditional Random Fields (CRFs), which directly model the conditional probability P (y|x) of a vector of hidden states conditioned on a set of observations, provide a unified framework for combining probabilistic and non-probabilistic information and have been shown to outperform HMMs on sequence labeling tasks in natural language processing. We describe the use of CRFs for comparative gene prediction. We implement a model that encapsulates both a phylogenetic-GHMM (our baseline comparative model) and additional non-probabilistic features. We tested our model on the genome sequence of the fungal human pathogen Cryptococcus neoformans. Our baseline comparative model displays accuracy comparable to the the best available gene prediction tool for this organism. Moreover, we show that discriminative training and the incorporation of non-probabilistic evidence significantly improve performance. Our software implementation, Conrad, is freely available with an open source license at http://www.broad.mit.edu/annotation/conrad/. 1
3 0.54046327 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
Author: Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou
Abstract: We consider the problem of training a conditional random field (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives significantly better testing performance than several current approaches for CRF training, especially in situations of high label noise. 1
4 0.52009821 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
Author: Fei Sha, Lawrence K. Saul
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.
5 0.46733692 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
Author: Yi Mao, Guy Lebanon
Abstract: We examine the problem of predicting local sentiment flow in documents, and its application to several areas of text analysis. Formally, the problem is stated as predicting an ordinal sequence based on a sequence of word sets. In the spirit of isotonic regression, we develop a variant of conditional random fields that is wellsuited to handle this problem. Using the M¨ bius transform, we express the model o as a simple convex optimization problem. Experiments demonstrate the model and its applications to sentiment prediction, style analysis, and text summarization. 1
6 0.45262831 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes
7 0.43115303 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity
8 0.42697394 134 nips-2006-Modeling Human Motion Using Binary Latent Variables
9 0.40547207 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype
10 0.40100718 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
11 0.3919256 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
12 0.39099127 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
13 0.38788092 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
14 0.37014765 111 nips-2006-Learning Motion Style Synthesis from Perceptual Observations
15 0.36543706 186 nips-2006-Support Vector Machines on a Budget
16 0.36059535 90 nips-2006-Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
17 0.35971722 81 nips-2006-Game Theoretic Algorithms for Protein-DNA binding
18 0.34049734 5 nips-2006-A Kernel Method for the Two-Sample-Problem
19 0.33546963 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
20 0.33163771 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
topicId topicWeight
[(1, 0.156), (3, 0.053), (7, 0.076), (9, 0.036), (22, 0.048), (44, 0.058), (57, 0.064), (58, 0.018), (65, 0.063), (69, 0.043), (76, 0.252), (90, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.80583316 108 nips-2006-Large Scale Hidden Semi-Markov SVMs
Author: Gunnar Rätsch, Sören Sonnenburg
Abstract: We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology. 1
2 0.65053713 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
Author: Wenye Li, Kin-hong Lee, Kwong-sak Leung
Abstract: Kernel-based regularized learning seeks a model in a hypothesis space by minimizing the empirical error and the model’s complexity. Based on the representer theorem, the solution consists of a linear combination of translates of a kernel. This paper investigates a generalized form of representer theorem for kernel-based learning. After mapping predefined features and translates of a kernel simultaneously onto a hypothesis space by a specific way of constructing kernels, we proposed a new algorithm by utilizing a generalized regularizer which leaves part of the space unregularized. Using a squared-loss function in calculating the empirical error, a simple convex solution is obtained which combines predefined features with translates of the kernel. Empirical evaluations have confirmed the effectiveness of the algorithm for supervised learning tasks.
3 0.63930839 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
Author: Fei Sha, Lawrence K. Saul
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.
4 0.63192898 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
5 0.62614387 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
6 0.62492245 20 nips-2006-Active learning for misspecified generalized linear models
7 0.6244725 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
8 0.62434363 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
9 0.62352526 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
10 0.62322211 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
11 0.62317401 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
12 0.62307483 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
13 0.62284303 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
14 0.62102199 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
15 0.6199668 35 nips-2006-Approximate inference using planar graph decomposition
16 0.61883956 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
17 0.6187712 117 nips-2006-Learning on Graph with Laplacian Regularization
18 0.61805809 82 nips-2006-Gaussian and Wishart Hyperkernels
19 0.61682892 61 nips-2006-Convex Repeated Games and Fenchel Duality
20 0.61571056 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields