nips nips2009 nips2009-56 knowledge-graph by maker-knowledge-mining

56 nips-2009-Conditional Neural Fields


Source: pdf

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. [sent-10, score-0.227]

2 Most CRF models use a linear potential function to represent the relationship between input features and output. [sent-11, score-0.139]

3 However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. [sent-12, score-0.515]

4 To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. [sent-13, score-0.366]

5 CNF extends CRF by adding one (or possibly more) middle layer between input and output. [sent-14, score-0.096]

6 The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. [sent-15, score-0.276]

7 In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition. [sent-18, score-0.547]

8 1 Introduction Sequence labeling is a ubiquitous problem arising in many areas, including natural language processing [1], bioinformatics [2, 3, 4] and computer vision [5]. [sent-19, score-0.111]

9 Given an input/observation sequence, the goal of sequence labeling is to infer the state sequence (also called output sequence), where a state may be some type of labeling or segmentation. [sent-20, score-0.319]

10 For example, in protein secondary structure prediction, the observation is a protein sequence consisting of a collection of residues. [sent-21, score-0.673]

11 The output is a sequence of secondary structure types. [sent-22, score-0.37]

12 HMM is a generative learning model since it generates output from a joint distribution between input and output. [sent-24, score-0.096]

13 In the past decade, several discriminative learning models such as conditional random fields (CRF) have emerged as the mainstream methods for sequence labeling. [sent-25, score-0.146]

14 It defines the conditional probability of the output given the input. [sent-27, score-0.116]

15 Another approach for sequence labeling is max margin structured learning such as max margin Markov 1 networks (MMMN) [8] and SVM-struct [9]. [sent-29, score-0.257]

16 In this work, we present a new probabilistic graphical model, called conditional neural fields (CNF), for sequence labeling. [sent-31, score-0.208]

17 Second, CNF automatically learns an implicit nonlinear representation of features and thus, can capture more complicated relationship between input and output. [sent-38, score-0.17]

18 2 Conditional Random Fields Assume the input and output sequences are X and Y , respectively. [sent-41, score-0.096]

19 = CRF uses two types of features given a pair of input and output sequences. [sent-46, score-0.118]

20 The first type of features describes the dependency between the neighboring output labels. [sent-47, score-0.081]

21 In a linear chain CRF model [7], the conditional probability of the output sequence Y given the input sequence X is the normalized product of the exponentials of potential functions on all edges and vertices in the chain. [sent-52, score-0.35]

22 This potential measures the compatibility between two neighbor output labels. [sent-54, score-0.102]

23 Although CRF is a very powerful model for sequence labeling, CRF does not work very well on the tasks in which the input features and output labels have complex relationship. [sent-55, score-0.191]

24 For example, in computer vision or bioinformatics, many problems require the modeling of complex/nonlinear relationship between input and output [10, 11]. [sent-56, score-0.149]

25 To model complex/nonlinear relationship between input and output, CRF has to explicitly enumerate all possible combinations of input features and output labels. [sent-57, score-0.233]

26 Nevertheless, even assisted with domain knowledge, it is not always possible for CRF to capture all the important nonlinear relationship by explicit enumeration. [sent-58, score-0.089]

27 3 Conditional Neural Fields Here we propose a new probabilistic graphical model, conditional neural fields (CNF), for sequence labeling. [sent-59, score-0.208]

28 CNF not only 2 can parametrize the conditional probability in the log-linear like formulation, but also is able to implicitly model complex/nonlinear relationship between input features and output labels. [sent-61, score-0.228]

29 In a linear chain CNF, the edge potential function is similar to that of a linear chain CRF. [sent-62, score-0.075]

30 That is, the edge function describes only the interdependency between the neighbor output labels. [sent-63, score-0.1]

31 K φ(Y, X, t) = y T wy,g h(θg f (X, t))δ[yt = y] (6) g=1 where h is a gate function. [sent-66, score-0.091]

32 In this work, we use the logistic function as the gate function. [sent-67, score-0.091]

33 In CNF, there is an extra hidden layer between the input and output, which consists of K gate functions (see Figure 1 and Equation (6)). [sent-70, score-0.239]

34 The K gate functions extract a K-dimensional implicit nonlinear representation of input features. [sent-71, score-0.186]

35 This paper mainly focuses on a linear chain CNF model for sequence labeling. [sent-74, score-0.097]

36 In the hidden layer, there are a set of neurons that extract implicit features from input. [sent-76, score-0.124]

37 Then the log-linear model in the output layer utilizes the implicit features as its input. [sent-77, score-0.162]

38 The parameters in the hidden neurons and the log-linear model can be jointly optimized. [sent-78, score-0.08]

39 After learning the parameters, we can first compute all the hidden neuron values from the input and then use an inference algorithm to predict the output. [sent-79, score-0.119]

40 Empirically the number of hidden neurons K is small, so the CNF inference procedure may have lower computational complexity than CRF. [sent-84, score-0.115]

41 In our experiments, CNF shows superior predictive performance over two baseline methods: neural networks and CRF. [sent-85, score-0.081]

42 N log P (Y |X) = (ψ(Y, X, t) + φ(Y, X, t))) − log Z(X) t=1 3 (7) Since CNF contains a hidden layer of gate function h, the log-likelihood function is not convex any more. [sent-88, score-0.202]

43 Although both the output and hidden layers contain model parameters, all the parameters can be learned together by gradient-based optimization. [sent-90, score-0.111]

44 We can use LBFGS [12] as the optimization routine to search for the optimal model parameters because 1) LBFGS is very efficient and robust; and 2) LBFGS provides us an approximation of inverse Hessian for hyperparameter learning [13], which will be described in the next section. [sent-91, score-0.08]

45 Since the K gate functions can be computed in advance, the computational complexity of the gradient computation is O(N KD + N M 2 K) for a single input-output pair with length N . [sent-95, score-0.111]

46 For example, in protein secondary structure prediction, K = 30 and D = 260. [sent-98, score-0.419]

47 5 Regularization and Hyperparameter Optimization Because an hidden layer is added to CNF to introduce more expressive power than CRF, it is crucial to control the model complexity of CNF to avoid overfitting. [sent-102, score-0.155]

48 Figure 2 shows the learning curve of the hyperparameter on a protein secondary structure prediction benchmark. [sent-134, score-0.527]

49 6 Related Work Most existing methods for sequence labeling are built under the framework of graphical models such as HMM and CRF. [sent-145, score-0.158]

50 Since these approaches are incapable of capturing highly complex relationship between observations and labels, many structured models are proposed for nonlinear modeling of label-observation dependency. [sent-146, score-0.11]

51 For example, kernelized max margin Markov networks [8], SVMstruct [9] and kernel CRF [16] use nonlinear kernels to model the complex relationship between 5 observations and labels. [sent-147, score-0.21]

52 Very recently, the probabilistic neural language model [17] and recurrent temporal restricted Boltzmann machine [18] are proposed for natural language and time series modeling. [sent-152, score-0.137]

53 By contrast, our CNF is a discriminative model, which is mainly used for discriminative prediction of sequence data. [sent-154, score-0.152]

54 The hierarchical recurrent neural networks [19, 20] can be viewed as a hybrid of HMM and neural networks (HMM/NN), building on a directed linear chain. [sent-155, score-0.219]

55 1 Protein Secondary Structure Prediction Protein secondary structure (SS) prediction is a fundamental problem in computational biology as well as a typical problem used to evaluate sequence labeling methods. [sent-158, score-0.437]

56 Given a protein sequence consisting of a collection of residues, the problem of protein SS prediction is to predict the secondary structure type at each residue. [sent-159, score-0.735]

57 A variety of methods have been described in literature for protein SS prediction. [sent-160, score-0.181]

58 Given a protein sequence,we first run PSI-BLAST [21] to generate sequence profile and then use this profile as input to predict SS. [sent-161, score-0.306]

59 A sequence profile is a position-specific scoring matrix X with n × 20 elements where n is the number of residues in a protein. [sent-162, score-0.109]

60 , yn ] where yi ∈ {H, E, C} represents the secondary structure type at the ith residue. [sent-171, score-0.262]

61 The true secondary structure for each protein is calculated using DSSP [23], which generates eight possible secondary structure states. [sent-173, score-0.657]

62 To determine the number of gate functions for CNF, we enumerate this number in set {10,20,30,40,60,100}. [sent-180, score-0.116]

63 We also enumerate window size for CNF in set {7,9,11,13,15,17} and find that the best evidence is achieved when window size is 13 and K = 30. [sent-181, score-0.087]

64 Two baseline methods are used for comparison: conditional random fields and neural networks. [sent-182, score-0.084]

65 The best window sizes for neural networks and CRF are 15 and 13, respectively. [sent-184, score-0.112]

66 We also compared our methods with other popular secondary structure prediction programs. [sent-185, score-0.285]

67 CRF, neural networks, Semi-Markov HMM [25], SVMpsi [24], PSIPRED[2] and CNF use the sequence profile generated by PSI-BLAST as described above. [sent-186, score-0.1]

68 YASSPP [27] and SPINE [28] also use other residue-specific features in addition to sequence profile. [sent-188, score-0.095]

69 First, by using one hidden layer to model the nonlinear relationship between input and output, CNF achieves a very significant gain over linear chain CRF. [sent-191, score-0.261]

70 This also confirms that strong nonlinear relationship exists between sequence profile and secondary structure type. [sent-192, score-0.4]

71 Second, by modeling interdependency between neighbor residues, CNF also obtains much better prediction accuracy over neural networks. [sent-193, score-0.131]

72 The predicted accuracy of HMM/NN is about three percent less than 6 Table 1: Performance of various methods for protein secondary structure prediction on the CB513 dataset. [sent-195, score-0.482]

73 PSIPRED is a two stage double-hidden layer neural network. [sent-199, score-0.086]

74 Methods Conditional Random Fields SVM-struct (Linear Kernel) Neural Networks (one hidden layer) Neural Networks (two hidden layer) Semimarkov HMM SVMpro SVMpsi PSIPRED YASSPP SPINE* Conditional Neural Fields Conditional Neural Fields* Q3(%) 72. [sent-203, score-0.104]

75 By seamlessly integrating neural networks and CRF, CNF outperforms all other thestate-of-art prediction methods on this dataset. [sent-215, score-0.128]

76 2 Handwriting Recognition Handwriting recognition(OCR) is another widely-used benchmark for sequence labeling algorithms. [sent-221, score-0.13]

77 The output we want to predict is a sequence of labels. [sent-229, score-0.147]

78 The number of gate functions for CNF is selected from set {10, 20, 30, 40, 60, 100} and we find that the best evidence is achieved when K = 40. [sent-235, score-0.091]

79 8 Discussion We present a probabilistic graphical model conditional neural fields (CNF) for sequence labeling tasks which require accurate account of nonlinear relationship between input and output. [sent-240, score-0.391]

80 CNF is a very natural integration of conditional graphical models and neural networks and thus, inherits advantages from both of them. [sent-241, score-0.166]

81 On one hand, by neural networks, CNF can model nonlinear relationship between input and output. [sent-242, score-0.153]

82 html 7 Table 2: Performance of various methods on handwriting recognition. [sent-246, score-0.081]

83 The results of logistic regression, SVM and max margin Markov networks are taken from [8]. [sent-247, score-0.08]

84 Both CNF and neural networks use 40 neurons in the hidden layer. [sent-248, score-0.161]

85 On protein secondary structure prediction, CNF achieves the best performance over all methods we tested. [sent-260, score-0.419]

86 on handwriting recognition, CNF also compares favorably with the best method max-margin Markov network. [sent-261, score-0.081]

87 We are currently generalizing our CNF model to a second-order Markov chain and a more general graph structure and also studying if it will improve predictive power of CNF by interposing more than one hidden layers between input and output. [sent-262, score-0.148]

88 Protein secondary structure prediction based on position-specific scoring matrices. [sent-271, score-0.285]

89 A tutorial on hidden markov models and selected applications in speech recognition. [sent-288, score-0.078]

90 Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [sent-294, score-0.13]

91 Support vector machine learning for interdependent and structured output spaces. [sent-300, score-0.08]

92 Comparison of probabilistic combination methods for protein secondary structure prediction. [sent-306, score-0.442]

93 Recurrent networks for structured data - a unifying approach and its properties. [sent-342, score-0.075]

94 a Gapped blast and psi-blast: a new generation of protein database search programs. [sent-363, score-0.181]

95 Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. [sent-368, score-0.492]

96 Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. [sent-371, score-0.402]

97 Protein secondary structure prediction based on an improved support vector machines approach. [sent-376, score-0.285]

98 A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. [sent-382, score-0.466]

99 Yasspp: Better kernels and coding schemes lead to improvements in protein secondary structure prediction. [sent-385, score-0.419]

100 Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. [sent-390, score-0.301]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cnf', 0.839), ('crf', 0.248), ('secondary', 0.203), ('protein', 0.181), ('gate', 0.091), ('handwriting', 0.081), ('sequence', 0.073), ('lbfgs', 0.068), ('mmmn', 0.068), ('yasspp', 0.068), ('yt', 0.062), ('hyperparameter', 0.061), ('fields', 0.06), ('output', 0.059), ('layer', 0.059), ('chicago', 0.058), ('conditional', 0.057), ('labeling', 0.057), ('psipred', 0.054), ('spine', 0.054), ('networks', 0.054), ('relationship', 0.053), ('hmm', 0.052), ('hidden', 0.052), ('prediction', 0.047), ('ss', 0.045), ('pro', 0.042), ('interdependency', 0.041), ('jinbo', 0.041), ('kenwood', 0.041), ('svmpro', 0.041), ('svmpsi', 0.041), ('recurrent', 0.039), ('hessian', 0.037), ('input', 0.037), ('nonlinear', 0.036), ('elds', 0.036), ('residues', 0.036), ('structure', 0.035), ('proteins', 0.034), ('toyota', 0.033), ('window', 0.031), ('bioinformatics', 0.03), ('svm', 0.029), ('technological', 0.029), ('le', 0.028), ('neurons', 0.028), ('graphical', 0.028), ('uyi', 0.027), ('wyt', 0.027), ('neural', 0.027), ('potential', 0.027), ('markov', 0.026), ('margin', 0.026), ('kd', 0.025), ('enumerate', 0.025), ('chain', 0.024), ('language', 0.024), ('expressive', 0.024), ('yi', 0.024), ('vertex', 0.023), ('probabilistic', 0.023), ('biology', 0.022), ('implicit', 0.022), ('il', 0.022), ('ep', 0.022), ('kernelized', 0.022), ('acids', 0.022), ('features', 0.022), ('position', 0.021), ('structured', 0.021), ('molecular', 0.021), ('ocr', 0.02), ('fy', 0.02), ('jian', 0.02), ('map', 0.02), ('complexity', 0.02), ('christian', 0.019), ('viterbi', 0.019), ('kernel', 0.019), ('inverse', 0.019), ('hybrid', 0.018), ('lafferty', 0.018), ('feng', 0.018), ('peng', 0.018), ('tested', 0.018), ('recognition', 0.018), ('geoffrey', 0.018), ('cubic', 0.017), ('tth', 0.016), ('yan', 0.016), ('xi', 0.016), ('accuracy', 0.016), ('discriminative', 0.016), ('compatibility', 0.016), ('inference', 0.015), ('boltzmann', 0.015), ('predict', 0.015), ('update', 0.015), ('bengio', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 56 nips-2009-Conditional Neural Fields

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

2 0.10257319 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

Author: Nan Ye, Wee S. Lee, Hai L. Chieu, Dan Wu

Abstract: Dependencies among neighbouring labels in a sequence is an important source of information for sequence labeling problems. However, only dependencies between adjacent labels are commonly exploited in practice because of the high computational complexity of typical inference algorithms when longer distance dependencies are taken into account. In this paper, we show that it is possible to design efficient inference algorithms for a conditional random field using features that depend on long consecutive label sequences (high-order features), as long as the number of distinct label sequences used in the features is small. This leads to efficient learning algorithms for these conditional random fields. We show experimentally that exploiting dependencies using high-order features can lead to substantial performance improvements for some problems and discuss conditions under which high-order features can be effective. 1

3 0.055005498 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

Author: Yang Wang, Gholamreza Haffari, Shaojun Wang, Greg Mori

Abstract: We propose a novel information theoretic approach for semi-supervised learning of conditional random fields that defines a training objective to combine the conditional likelihood on labeled data and the mutual information on unlabeled data. In contrast to previous minimum conditional entropy semi-supervised discriminative learning methods, our approach is grounded on a more solid foundation, the rate distortion theory in information theory. We analyze the tractability of the framework for structured prediction and present a convergent variational training algorithm to defy the combinatorial explosion of terms in the sum over label configurations. Our experimental results show the rate distortion approach outperforms standard l2 regularization, minimum conditional entropy regularization as well as maximum conditional entropy regularization on both multi-class classification and sequence labeling problems. 1

4 0.053935621 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

Author: Chun-nan Hsu, Yu-ming Chang, Hanshen Huang, Yuh-jye Lee

Abstract: It has been established that the second-order stochastic gradient descent (2SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass (i.e., epoch) through the training examples. However, 2SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive. This paper presents Periodic Step-size Adaptation (PSA), which approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian periodically and achieve near-optimal results in experiments on a wide variety of models and tasks. 1

5 0.050625589 72 nips-2009-Distribution Matching for Transduction

Author: Novi Quadrianto, James Petterson, Alex J. Smola

Abstract: Many transductive inference algorithms assume that distributions over training and test estimates should be related, e.g. by providing a large margin of separation on both sets. We use this idea to design a transduction algorithm which can be used without modification for classification, regression, and structured estimation. At its heart we exploit the fact that for a good learner the distributions over the outputs on training and test sets should match. This is a classical two-sample problem which can be solved efficiently in its most general form by using distance measures in Hilbert Space. It turns out that a number of existing heuristics can be viewed as special cases of our approach. 1

6 0.0502587 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

7 0.049512021 97 nips-2009-Free energy score space

8 0.044441465 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

9 0.043914605 2 nips-2009-3D Object Recognition with Deep Belief Nets

10 0.040591981 246 nips-2009-Time-Varying Dynamic Bayesian Networks

11 0.039238945 151 nips-2009-Measuring Invariances in Deep Networks

12 0.038735427 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

13 0.034801945 87 nips-2009-Exponential Family Graph Matching and Ranking

14 0.034639485 157 nips-2009-Multi-Label Prediction via Compressed Sensing

15 0.034200497 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

16 0.033798415 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

17 0.033674728 31 nips-2009-An LP View of the M-best MAP problem

18 0.033531021 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

19 0.033318967 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

20 0.033073816 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.117), (1, -0.013), (2, 0.006), (3, 0.004), (4, 0.002), (5, -0.019), (6, -0.001), (7, 0.043), (8, -0.051), (9, -0.032), (10, 0.026), (11, 0.026), (12, -0.04), (13, -0.012), (14, 0.007), (15, 0.025), (16, 0.063), (17, -0.107), (18, -0.026), (19, -0.057), (20, 0.027), (21, -0.002), (22, 0.019), (23, 0.073), (24, 0.028), (25, 0.087), (26, 0.047), (27, -0.015), (28, 0.012), (29, -0.021), (30, -0.025), (31, 0.028), (32, -0.057), (33, 0.068), (34, -0.008), (35, -0.061), (36, 0.008), (37, 0.018), (38, -0.051), (39, -0.063), (40, -0.021), (41, -0.046), (42, 0.039), (43, 0.012), (44, 0.011), (45, 0.031), (46, 0.045), (47, 0.059), (48, 0.01), (49, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89969116 56 nips-2009-Conditional Neural Fields

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

2 0.68264854 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

Author: Nan Ye, Wee S. Lee, Hai L. Chieu, Dan Wu

Abstract: Dependencies among neighbouring labels in a sequence is an important source of information for sequence labeling problems. However, only dependencies between adjacent labels are commonly exploited in practice because of the high computational complexity of typical inference algorithms when longer distance dependencies are taken into account. In this paper, we show that it is possible to design efficient inference algorithms for a conditional random field using features that depend on long consecutive label sequences (high-order features), as long as the number of distinct label sequences used in the features is small. This leads to efficient learning algorithms for these conditional random fields. We show experimentally that exploiting dependencies using high-order features can lead to substantial performance improvements for some problems and discuss conditions under which high-order features can be effective. 1

3 0.59727305 97 nips-2009-Free energy score space

Author: Alessandro Perina, Marco Cristani, Umberto Castellani, Vittorio Murino, Nebojsa Jojic

Abstract: A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments, or other sequence data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as a “score space”. Discriminative classifiers have been shown to achieve higher performance in appropriately chosen score spaces than is achievable by either the corresponding generative likelihood-based classifiers, or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account latent structure of the data at various levels, and can be trivially shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model, and the same factorization of the posterior. We also show that in several typical vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches to combining discriminating and generative models.

4 0.54722714 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

Author: Chun-nan Hsu, Yu-ming Chang, Hanshen Huang, Yuh-jye Lee

Abstract: It has been established that the second-order stochastic gradient descent (2SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass (i.e., epoch) through the training examples. However, 2SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive. This paper presents Periodic Step-size Adaptation (PSA), which approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian periodically and achieve near-optimal results in experiments on a wide variety of models and tasks. 1

5 0.53992498 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

Author: Yang Wang, Gholamreza Haffari, Shaojun Wang, Greg Mori

Abstract: We propose a novel information theoretic approach for semi-supervised learning of conditional random fields that defines a training objective to combine the conditional likelihood on labeled data and the mutual information on unlabeled data. In contrast to previous minimum conditional entropy semi-supervised discriminative learning methods, our approach is grounded on a more solid foundation, the rate distortion theory in information theory. We analyze the tractability of the framework for structured prediction and present a convergent variational training algorithm to defy the combinatorial explosion of terms in the sum over label configurations. Our experimental results show the rate distortion approach outperforms standard l2 regularization, minimum conditional entropy regularization as well as maximum conditional entropy regularization on both multi-class classification and sequence labeling problems. 1

6 0.51819944 127 nips-2009-Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

7 0.51069695 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models

8 0.51052278 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

9 0.50588447 72 nips-2009-Distribution Matching for Transduction

10 0.50059724 2 nips-2009-3D Object Recognition with Deep Belief Nets

11 0.4855763 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

12 0.47512972 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

13 0.47180909 248 nips-2009-Toward Provably Correct Feature Selection in Arbitrary Domains

14 0.4505564 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models

15 0.44692254 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

16 0.44481558 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

17 0.44175026 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

18 0.43542004 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

19 0.42802554 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

20 0.41942579 157 nips-2009-Multi-Label Prediction via Compressed Sensing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.011), (24, 0.024), (25, 0.051), (35, 0.035), (36, 0.113), (39, 0.042), (58, 0.053), (61, 0.018), (71, 0.441), (81, 0.016), (86, 0.069)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98618805 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

Author: Laura Dietz, Valentin Dallmeier, Andreas Zeller, Tobias Scheffer

Abstract: We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino. 1

2 0.98202747 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei

Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1

3 0.97830319 53 nips-2009-Complexity of Decentralized Control: Special Cases

Author: Martin Allen, Shlomo Zilberstein

Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1

4 0.95204222 11 nips-2009-A General Projection Property for Distribution Families

Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1

same-paper 5 0.8951031 56 nips-2009-Conditional Neural Fields

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

6 0.89243716 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

7 0.81494045 205 nips-2009-Rethinking LDA: Why Priors Matter

8 0.80397558 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

9 0.73672849 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

10 0.72958028 204 nips-2009-Replicated Softmax: an Undirected Topic Model

11 0.71412772 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

12 0.70582616 206 nips-2009-Riffled Independence for Ranked Data

13 0.70574492 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

14 0.6931051 96 nips-2009-Filtering Abstract Senses From Image Search Results

15 0.68641442 226 nips-2009-Spatial Normalized Gamma Processes

16 0.68013358 260 nips-2009-Zero-shot Learning with Semantic Output Codes

17 0.67337602 154 nips-2009-Modeling the spacing effect in sequential category learning

18 0.66653013 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

19 0.66421413 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

20 0.6556589 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference