jmlr jmlr2005 jmlr2005-8 knowledge-graph by maker-knowledge-mining

8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

Source: pdf

Author: Josh Bongard, Hod Lipson

Abstract: This paper describes an active learning approach to the problem of grammatical inference, specifically the inference of deterministic ﬁnite automata (DFAs). We refer to the algorithm as the estimation-exploration algorithm (EEA). This approach differs from previous passive and active learning approaches to grammatical inference in that training data is actively proposed by the algorithm, rather than passively receiving training data from some external teacher. Here we show that this algorithm outperforms one version of the most powerful set of algorithms for grammatical inference, evidence driven state merging (EDSM), on randomly-generated DFAs. The performance increase is due to the fact that the EDSM algorithm only works well for DFAs with speciﬁc balances (percentage of positive labelings), while the EEA is more consistent over a wider range of balances. Based on this ﬁnding we propose a more general method for generating DFAs to be used in the development of future grammatical inference algorithms. Keywords: grammatical inference, evolutionary computation, deterministic ﬁnite automata, active learning, system identiﬁcation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This approach differs from previous passive and active learning approaches to grammatical inference in that training data is actively proposed by the algorithm, rather than passively receiving training data from some external teacher. [sent-7, score-0.671]

2 Based on this ﬁnding we propose a more general method for generating DFAs to be used in the development of future grammatical inference algorithms. [sent-10, score-0.289]

3 Keywords: grammatical inference, evolutionary computation, deterministic ﬁnite automata, active learning, system identiﬁcation 1. [sent-11, score-0.461]

4 In one class of grammatical inference methods, the system is considered to be some kind of language or classiﬁer, and models are represented as deterministic ﬁnite automata (DFA). [sent-15, score-0.476]

5 Both the target system and models take strings of symbols as input (sentences), and produce binary classiﬁcation as output (labellings), indicating whether that sentence belongs to the language or not. [sent-16, score-0.421]

6 The problem of grammatical inference can also be considered a special instance of the problem of system identiﬁcation (Ljung, 1999), in which some target system is inferred based solely on input/output data. [sent-17, score-0.419]

7 Grammatical inference methods that employ DFAs as models can be divided into two broad classes: passive and active learning methods. [sent-18, score-0.36]

8 In these iterative active approaches the amount of training data available for inference grows over time, unlike passive approaches, in which a ﬁxed set of training data is used for model construction. [sent-23, score-0.433]

9 However, for real-world usage of grammatical inference algorithms, it is unreasonable to assume that the internal structure of the DFA is known: indeed, this is exactly what is being inferred. [sent-34, score-0.289]

10 In this work we present an active learning algorithm that makes few assumptions about the structure of the target DFA, and in fact outperforms one of the best heuristic methods for grammatical inference, which implicitly assumes that the DFAs are balanced (i. [sent-35, score-0.474]

11 However, like the other passive methods, both heuristic and evolutionary approaches so far assume that some external agent generates either a random or balanced training set2 before inference begins. [sent-41, score-0.443]

12 Despite this active approach to training data generation, these algorithms also require an external agent—an oracle—that can answer equivalence queries: the oracle indicates whether the current model is equivalent to the target DFA and, if it is not, returns 1. [sent-44, score-0.295]

13 1652 ACTIVE C OEVOLUTIONARY L EARNING OF D ETERMINISTIC F INITE AUTOMATA new training data that belongs to the target language but does not belong to the language encoded by the candidate model. [sent-49, score-0.318]

14 Other active learning approaches to language inference also exist, but they all assume completely passive reception of training data: Sempere and Garcia (1993) only require that samples be presented in lexicographic order, and the RPNI (Oncina and Garci´ , 1992, and Lang et al. [sent-54, score-0.4]

15 For example, sentences should be sent to a target system that, during labelling, cause transitions to states that have never or rarely been visited during previous labellings. [sent-57, score-0.347]

16 For this and other reasons, it is not surprising that active learning approaches outperform passive methods: active methods have more control over the collection of training data. [sent-60, score-0.415]

17 However the point of this paper is to demonstrate one reason why active methods outperform passive methods: namely, that they perform well on both balanced and imbalanced DFAs. [sent-61, score-0.314]

18 In such cases, generating random training data is not recommended, because few or no sentences that elucidate the pathways to accepting states will be collected. [sent-66, score-0.291]

19 Secondly, generating balanced training data requires many labellings by the target system until a sufﬁcient number of the minority labellings are collected. [sent-69, score-0.589]

20 024 = 4167 labellings of randomly generated sentences would have to be performed. [sent-71, score-0.303]

21 For the case of imbalanced DFAs, we attribute this performance improvement to the discovery of sufﬁcient minority class training data to produce accurate models: randomly-generated training data contains too little minority class training data. [sent-74, score-0.363]

22 In the next section we brieﬂy describe grammatical inference, as well as describing our method for the inference of target DFAs using active training data generation. [sent-78, score-0.553]

23 We also document an evolutionary and a heuristics-based method for performing grammatical inference using pre-selected training data. [sent-79, score-0.449]

24 Methods In this section we introduce grammatical inference, and outline three methods for approaching the problem: evidence-driven state merging, evolutionary approaches, and the estimation-exploration algorithm. [sent-83, score-0.353]

25 Then, given some sentence made up of a string of symbols taken from the alphabet Σ, and beginning at the start state s, the ﬁrst symbol is extracted from the sentence, and based on that symbol the sentence transitions to a new state as indicated by T . [sent-86, score-0.377]

26 If the last state visited is a member of F, then the sentence receives a positive classiﬁcation (the sentence belongs to the language); otherwise, a negative classiﬁcation is assigned (the sentence does not belong to the language). [sent-90, score-0.511]

27 The quality of a grammatical inference algorithm is viewed as one that can produce some T and F (together referred to as a candidate DFA) that matches the labels of a pool of sentences that have already been labelled by the target DFA (the training set accuracy). [sent-91, score-0.758]

28 The candidate DFA 1654 ACTIVE C OEVOLUTIONARY L EARNING OF D ETERMINISTIC F INITE AUTOMATA should then produce a high classiﬁcation accuracy when supplied with a different set of unlabelled sentences (test set accuracy). [sent-92, score-0.353]

29 3 Evolutionary Approaches to Grammatical Inference Evolutionary approaches to grammatical inference have also been proposed (Brave, 1996; Luke et al. [sent-120, score-0.289]

30 Generally, an evolutionary algorithm comprises a population of candidate models of the target DFA that compete against each other, and the ﬁtness of a particular model is given by the percentage of training data that it can correctly classify. [sent-122, score-0.466]

31 1 E VOLVING DFA S WITH A F IXED N UMBER OF S TATES Lucas and Reynolds (2005) proposed an evolutionary approach to grammatical inference in which the number of states in candidate models is ﬁxed at 5n/4, where n is believed to be the number of states in the target DFA. [sent-128, score-0.731]

32 Initialization • Create an initial population of candidate models (random, blank, or seeded with prior information) • Create an initial population of candidate tests (random, or seeded with prior information) 3. [sent-132, score-0.348]

33 Exploration Phase • Evolve candidate tests (input sets) • Fitness of a test is the disagreement it causes among good candidate models • Carry out best test on target system, add input/output data to training set 5. [sent-134, score-0.488]

34 • If no model is found, the search space may be inappropriate, or the target system may be inconsistent • If no good test is found, then either: – all good candidate models are perfect; – the search method for ﬁnding good tests is failing; or – the target system may be partially unobservable 6. [sent-136, score-0.414]

35 For each state in T , compute the ratio of positive and negative training sentences that terminate at that state: if more positive sentences terminate there, then consider that state an accepting state; otherwise, consider it a rejecting state. [sent-142, score-0.483]

36 4 The Estimation-Exploration Algorithm Both the heuristic and evolutionary method described above assume passive inference: a set of labelled data is presented to the algorithm, and the algorithm produces a candidate model of the target DFA. [sent-150, score-0.506]

37 This new sentence is then supplied to the target system, and the estimation phase then begins again with i + 1 training data points. [sent-154, score-0.43]

38 One population is of candidate models of the target system, where a model’s ﬁtness is determined by its ability to correctly explain observed data from the target system. [sent-160, score-0.391]

39 1, and the speciﬁc application to grammatical inference is given below. [sent-169, score-0.289]

40 1, we compare our results against those of Lucas and Reynolds (2005), in which the number of states in a candidate model was ﬁxed to be 5n : we set the maximum number 4 of states in a candidate model to be 2n in order to make less assumptions about the size of the target DFA. [sent-177, score-0.44]

41 Due to the difﬁculties of devising a similarity metric between the target DFA and a given candidate model, we have chosen to denote similarity between a model and target DFA as the test set accuracy of the candidate model. [sent-184, score-0.485]

42 If it were possible to deﬁne a similarity metric between the target and a model DFA, it would be possible to quantitatively determine how well an inference algorithm was doing by periodically measuring the similarity of candidate models against a target DFA. [sent-185, score-0.44]

43 In the exploration phase, for the grammatical inference problem a training item is considered to be an unlabelled binary sentence s . [sent-187, score-0.556]

44 Each sentence is represented as a binary vector of length smax , where smax is the maximum sentence length to be found in the training or test set. [sent-188, score-0.427]

45 2 I NITIALIZATION The algorithm begins inference by generating an unlabelled sentence at random, which is then labelled by the target DFA. [sent-196, score-0.403]

46 In order to generate a pool of competing candidate models, the population of models in the estimation phase is partitioned into two equally-sized, reproductively isolated sub-populations: no candidate model can place offspring into the other sub-population. [sent-199, score-0.434]

47 Starting with an initial population of p candidate models (with p/2 models in each of the two sub-populations), the population is evaluated, ﬁt models are selected, copied and mutated, and less ﬁt models are deleted. [sent-209, score-0.365]

48 The ﬁtness of a candidate model is given by ∑ij=1 |t j − m j | , (1) i where t j is the labelling provided for the jth sentence by the target DFA, and m j is the labelling provided for the same sentence by the model DFA. [sent-212, score-0.786]

49 Then a candidate model that obtains fT = 1 perfectly labels all of the i training sentences seen so far, and models with lower values of fT have lower accuracies. [sent-213, score-0.366]

50 4 E XPLORATION P HASE The exploration phase maintains a population of the same size as that of the estimation phase (p), and evolves candidate sentences for the same number of generations (g). [sent-229, score-0.554]

51 At the end of each generation, 3p pairs of sentences are selected, copied and mutated as described in the previous section: the 4 sentence with higher ﬁtness is copied over the sentence with lower ﬁtness, and the copied sentence is then mutated. [sent-230, score-0.713]

52 1659 B ONGARD AND L IPSON The ﬁtness for a given sentence is set to the amount of disagreement that that sentence causes when labelled by a pool of k candidate models: ∑k c j j=1 |, (2) k where c j is the classiﬁcation of the candidate sentence by model j. [sent-231, score-0.778]

53 Sentences that do not differentiate between the candidate models—all models produce the same classiﬁcation—obtain fs = 0 (poorest quality); sentences that produce the maximum classiﬁcation variance obtain fs = 1 (best quality). [sent-232, score-0.353]

54 In the experiments reported here, however, the number of iterations was set to use exactly the same number of sentence labellings as the benchmark algorithms we compare it to require. [sent-247, score-0.303]

55 This results in the labelling of t sentences by the target DFA, t passes through the estimation phase, and t − 1 passes through the exploration phase (the ﬁrst sentence proposed for labelling is a random sentence). [sent-249, score-0.772]

56 Although the passive variant of the EEA proposes sentences to the target DFA for labeling, it is considered passive because it does not actively construct training data; rather, it outputs random training data. [sent-260, score-0.669]

57 This stresses the importance of actively seeking informative sentences for target labeling. [sent-261, score-0.305]

58 (1998), the target DFAs were generated by creating random digraphs with 5n/4, where n is the desired number of active states: an active state is one that is visited by at least one test or training sentence during labelling. [sent-263, score-0.602]

59 In order to fairly compare our active learning method against passive methods, we have elected to equalize the number of labellings performed by the target DFA (i. [sent-274, score-0.499]

60 the number of training sentences that are labelled), and the number of labellings performed by candidate models during inference. [sent-276, score-0.499]

61 In Lucas and Reynolds (2005), the total number of candidate model labellings m is equal to the number of training sentences times the number of mutations considered during hill climbing: m = dStotal × 106 = t × 106 , where d is training set density and t is the number of training sentences. [sent-278, score-0.594]

62 k indicates the number of candidate models output by the estimation phase,3 so pgk(t − 1) indicates how many labellings are performed during the t − 1 passes through the exploration phase. [sent-280, score-0.359]

63 The second term indicates how many labellings are performed during the t passes through the estimation phase: during the ﬁrst pass there is only one labelling per candidate model; during the second pass there are two labellings per model; and so on. [sent-281, score-0.623]

64 The passive variant disables the exploration phase of the algorithm, so that at each cycle through the algorithm, a random sentence is output to the target DFA for labelling. [sent-288, score-0.537]

65 At the end of each pass through the estimation phase for each variant, the test set accuracy of the best candidate model output by the ﬁrst sub-population is computed. [sent-293, score-0.296]

66 As can be seen in Figure 1a, the active variant outputs a model consistent with all training and test data after the 93rd pass through the estimation phase. [sent-295, score-0.287]

67 Second, it is noted that the size of the best candidate model increases and decreases over the inference process in the active variant of the EEA. [sent-299, score-0.357]

68 2 C OMPARATIVE P ERFORMANCE A MONG I NFERENCE A LGORITHMS Each of four algorithms—EDSM, Lucas’ method, and the active and passive variants of the EEA— were run against 3720 target DFAs: 1200 with n = 4, 1200 with n = 8, 1200 with n = 16, and 120 with n = 32 states. [sent-317, score-0.351]

69 The total number of training sentences available for inference for each DFA size and training set density is shown in Table 3. [sent-320, score-0.344]

70 Figure 3 reports the percentage of runs of both the passive and active EEA variants that produced a model that correctly classiﬁes all training and test data. [sent-450, score-0.331]

71 6; the minority of DFAs produce less than 40% labellings of their minority classiﬁcation. [sent-624, score-0.352]

72 First, the evolutionary approach of Lucas and Reynolds (2005) tends to outperform the EDSM variant for smaller target DFAS (n < 32), but the EDSM far outperforms Lucas’ algorithm for larger target DFAs (n = 32). [sent-662, score-0.347]

73 Second, the active EEA variant outperforms all three other algorithms for the smallest size of target DFA (n = 4); is competitive with Lucas’ algorithm for DFAs with n = 8 and n = 16; and is competitive with EDSM on the largest target DFAs (n = 32). [sent-663, score-0.351]

74 Because the passive variant performs the same or less model evaluations than Lucas’ algorithm, and it randomly selects the same number of training sentences, we can conclude that our particular method of evolutionary search is inferior to that proposed by Lucas and Reynolds (2005). [sent-665, score-0.341]

75 For DFAs with n = 32 states, it can be seen that the active EEA generates training strings that achieve a more balanced labelling (lower righthand panel) than the passive EEA variant which outputs random strings for labelling (middle righthand panel). [sent-904, score-0.732]

76 At the outset of inference using the active EEA, training strings are generated at random, because sufﬁciently accurate models do not yet exist. [sent-906, score-0.338]

77 As inference proceeds, a few training strings are output to the target system that obtain a minority labelling. [sent-907, score-0.391]

78 This indicates that training strings with high ﬁtness at least elicit a minority labelling from some of the candidate models, and since the models are now somewhat accurate, this increases the probability of obtaining a new minority labelling from the target system. [sent-910, score-0.786]

79 As inference proceeds, minority labellings are extracted with increasing probability, allowing for the better inference of imbalanced DFAs. [sent-911, score-0.409]

80 This advantage of active training data generation, compared to passive training data collection, is also supported by the results reported in Figure 3. [sent-912, score-0.343]

81 Clearly, the active EEA discovers models consistent with all training and test data more often than the passive EEA. [sent-913, score-0.35]

82 This requires rethinking how grammatical inference algorithms are compared: rather than simply providing them with training data already collected from a DFA, the algorithms should be free to request training data labelling, as is done in the active learning community (Baram et al. [sent-916, score-0.499]

83 1 Intelligent Testing The results reported here support the claim that useful experiments (in the domain of grammatical inference, binary sentences) are those that elicit informative responses from the target system. [sent-919, score-0.356]

84 Here we have stressed that one informative type of test are training data belonging to the minority class of an imbalanced DFA: automatically and actively generating such informative tests helps the algorithm to outperform other methods that rely on passively generated random training data. [sent-921, score-0.299]

85 There may be other kinds of informative sentences in grammatical inference that are unknowingly being favored by the active EEA variant. [sent-926, score-0.606]

86 It is clear that states closer4 to the start state in the transition function T will be visited more often than distant states, and longer sentences have a higher probability of reaching these distant states than shorter sentences do. [sent-928, score-0.457]

87 So, we predict that the active EEA variant will propose, on average, longer training sentences than a passive algorithm will. [sent-930, score-0.485]

88 Whether longer sentences truly are more informative than shorter ones, and whether longer sentences are actually favored by the active EEA variant has not yet been veriﬁed. [sent-931, score-0.488]

89 In grammatical inference, unbalanced DFAs are less observable than balanced DFAs: there are either less states that produce the minority labelling than states that produce the majority labelling, or minority labelling states are more distant from the start state than majority labelling states. [sent-933, score-0.992]

90 It follows then that passive grammatical inference approaches are inappropriate in these cases, for one of two reasons: either a balanced training set is assumed, in which case the minority class is grossly over-represented in the training data; or random training 4. [sent-934, score-0.692]

91 An active approach to grammatical inference, like the estimation-exploration algorithm, actively generates a training set that falls between these two extremes, and accelerates inference. [sent-937, score-0.401]

92 Also, acquiring a balanced training set will require a large number of target labellings in order to obtain enough of the minority labellings. [sent-939, score-0.426]

93 Furthermore, our method may be useful for indicating what kinds of training data is most useful for the inference of particular kinds of languages, by simply observing what kinds of sentences are generated by our method. [sent-941, score-0.327]

94 Formulating time complexities for both methods as a function of both DFA size and balance as well as allowed number of target labellings requires further investigation. [sent-964, score-0.289]

95 It was shown that the EEA performs better on DFAs with differing balances by actively extracting minority labellings from the target DFA. [sent-975, score-0.513]

96 In order to better gauge the inference ability of grammatical inference methods, we introduce a more general method for generating DFAs with speciﬁc sizes and balances. [sent-976, score-0.364]

97 Our algorithm also allows for continual expansion and compression of candidate models over time, in response to new training data: expansion allows for the accommodation of new training data, and compression usually leads to greater test set accuracy. [sent-978, score-0.295]

98 In many approaches to grammatical inference, either a balanced training set is assumed, or random training data is generated. [sent-984, score-0.348]

99 One possible approach would be to evolve test sequences that cause the candidate models to disagree most in the class probabilities they predict the target system will output for that sequence. [sent-989, score-0.305]

100 Learning deterministic ﬁnite automata with a smart state labelling evolutionary algorithm. [sent-1098, score-0.364]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dfas', 0.56), ('dfa', 0.352), ('eea', 0.321), ('edsm', 0.301), ('grammatical', 0.214), ('sentence', 0.155), ('sentences', 0.155), ('labellings', 0.148), ('passive', 0.133), ('lucas', 0.132), ('active', 0.118), ('candidate', 0.116), ('labelling', 0.115), ('evolutionary', 0.114), ('automata', 0.11), ('target', 0.1), ('minority', 0.09), ('tness', 0.085), ('differing', 0.076), ('balances', 0.076), ('inference', 0.075), ('reynolds', 0.073), ('ipson', 0.071), ('ongard', 0.071), ('phase', 0.071), ('oevolutionary', 0.066), ('strings', 0.065), ('lang', 0.057), ('inite', 0.056), ('eterministic', 0.056), ('accepting', 0.051), ('training', 0.046), ('exploration', 0.045), ('balanced', 0.042), ('balance', 0.041), ('bongard', 0.041), ('mutation', 0.041), ('population', 0.041), ('pass', 0.04), ('states', 0.039), ('disagreement', 0.038), ('genetic', 0.034), ('models', 0.034), ('variant', 0.033), ('copied', 0.031), ('tomita', 0.031), ('labelled', 0.028), ('language', 0.028), ('languages', 0.028), ('informative', 0.027), ('densities', 0.026), ('rejecting', 0.026), ('smax', 0.026), ('offspring', 0.025), ('state', 0.025), ('merging', 0.025), ('accuracies', 0.025), ('earning', 0.024), ('produce', 0.024), ('begins', 0.024), ('actively', 0.023), ('transition', 0.023), ('queries', 0.022), ('density', 0.022), ('baram', 0.021), ('evolve', 0.021), ('imbalanced', 0.021), ('unbalanced', 0.021), ('unlabelled', 0.021), ('visited', 0.021), ('cicchello', 0.02), ('dupont', 0.02), ('generations', 0.02), ('honavar', 0.02), ('lipson', 0.02), ('luke', 0.02), ('parekh', 0.02), ('evolves', 0.019), ('test', 0.019), ('accuracy', 0.019), ('supplied', 0.018), ('kinds', 0.017), ('agent', 0.017), ('generation', 0.017), ('compression', 0.017), ('dangerous', 0.017), ('cornell', 0.017), ('inferring', 0.017), ('transitions', 0.017), ('regular', 0.017), ('estimation', 0.016), ('merge', 0.016), ('external', 0.016), ('perfect', 0.016), ('system', 0.015), ('model', 0.015), ('abbadingo', 0.015), ('apta', 0.015), ('colloquium', 0.015), ('elicit', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

Author: Josh Bongard, Hod Lipson

2 0.06418819 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

Author: Tong Luo, Kurt Kramer, Dmitry B. Goldgof, Lawrence O. Hall, Scott Samson, Andrew Remsen, Thomas Hopkins

Abstract: This paper presents an active learning method which reduces the labeling effort of domain experts in multi-class classiﬁcation problems. Active learning is applied in conjunction with support vector machines to recognize underwater zooplankton from higher-resolution, new generation SIPPER II images. Most previous work on active learning with support vector machines only deals with two class problems. In this paper, we propose an active learning approach “breaking ties” for multiclass support vector machines using the one-vs-one approach with a probability approximation. Experimental results indicate that our approach often requires signiﬁcantly less labeled images to reach a given accuracy than the approach of labeling the least certain test example and random sampling. It can also be applied in batch mode resulting in an accuracy comparable to labeling one image at a time and retraining. Keywords: active learning, support vector machine, plankton recognition, probabilistic output, multi-class support vector machine

3 0.040202573 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efﬁcient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We ﬁrst present an online SVM algorithm based on this premise. LASVM yields competitive misclassiﬁcation rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

4 0.029720241 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

Author: Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun

Abstract: Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing ﬂexible and powerful input representations, this paper addresses the complementary issue of designing classiﬁcation algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classiﬁcation problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

5 0.026135724 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

Author: S. Sathiya Keerthi, Dennis DeCoste

Abstract: This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classiﬁcation. This is done by modifying the ﬁnite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than decomposition methods such as SVMlight , SMO and BSVM (e.g., 4-100 fold), especially when the number of examples is large. The paper also suggests ways of extending the method to other loss functions such as the modiﬁed Huber’s loss function and the L1 loss function, and also for solving ordinal regression. Keywords: linear SVMs, classiﬁcation, conjugate gradient

6 0.024153909 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

7 0.022755804 28 jmlr-2005-Efficient Computation of Gapped Substring Kernels on Large Alphabets

8 0.021843202 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

9 0.021739211 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

10 0.020308755 61 jmlr-2005-Prioritization Methods for Accelerating MDP Solvers

11 0.018540263 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

12 0.018107075 12 jmlr-2005-An MDP-Based Recommender System

13 0.016732374 36 jmlr-2005-Gaussian Processes for Ordinal Regression

14 0.016638596 59 jmlr-2005-New Horn Revision Algorithms

15 0.015618772 32 jmlr-2005-Expectation Consistent Approximate Inference

16 0.013870808 7 jmlr-2005-A Unifying View of Sparse Approximate Gaussian Process Regression

17 0.013514119 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

18 0.013485685 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

19 0.01256908 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

20 0.012489514 71 jmlr-2005-Variational Message Passing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.088), (1, 0.033), (2, 0.072), (3, -0.09), (4, 0.053), (5, -0.031), (6, 0.068), (7, -0.001), (8, -0.052), (9, -0.059), (10, 0.127), (11, -0.112), (12, -0.147), (13, 0.208), (14, 0.015), (15, -0.073), (16, 0.095), (17, -0.229), (18, 0.012), (19, 0.029), (20, 0.032), (21, -0.135), (22, -0.309), (23, -0.164), (24, -0.046), (25, 0.055), (26, 0.073), (27, -0.122), (28, 0.087), (29, 0.238), (30, -0.146), (31, 0.189), (32, 0.046), (33, -0.304), (34, -0.126), (35, 0.436), (36, 0.173), (37, 0.155), (38, -0.218), (39, 0.032), (40, 0.089), (41, -0.101), (42, -0.186), (43, 0.069), (44, 0.098), (45, -0.025), (46, 0.098), (47, -0.035), (48, -0.085), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97226405 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

Author: Josh Bongard, Hod Lipson

2 0.16878249 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

Author: Tong Luo, Kurt Kramer, Dmitry B. Goldgof, Lawrence O. Hall, Scott Samson, Andrew Remsen, Thomas Hopkins

3 0.11631963 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

4 0.10865785 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

Author: Hal Daumé III, Daniel Marcu

Abstract: We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to deﬁne distributions over the countably inﬁnite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over inﬁnitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artiﬁcial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics. Keywords: supervised clustering, record linkage, citation matching, coreference, Dirichlet process, non-parametric Bayesian

5 0.10043287 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

Author: Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun

6 0.092782669 12 jmlr-2005-An MDP-Based Recommender System

7 0.080931976 32 jmlr-2005-Expectation Consistent Approximate Inference

8 0.078191124 61 jmlr-2005-Prioritization Methods for Accelerating MDP Solvers

9 0.072217353 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems

10 0.07183595 7 jmlr-2005-A Unifying View of Sparse Approximate Gaussian Process Regression

11 0.062226344 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

12 0.057373878 3 jmlr-2005-A Classification Framework for Anomaly Detection

13 0.053336032 59 jmlr-2005-New Horn Revision Algorithms

14 0.052895039 50 jmlr-2005-Learning with Decision Lists of Data-Dependent Features

15 0.051137712 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

16 0.050876331 54 jmlr-2005-Managing Diversity in Regression Ensembles

17 0.049426056 36 jmlr-2005-Gaussian Processes for Ordinal Regression

18 0.04899238 18 jmlr-2005-Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems

19 0.044632457 57 jmlr-2005-Multiclass Boosting for Weak Classifiers

20 0.042845596 28 jmlr-2005-Efficient Computation of Gapped Substring Kernels on Large Alphabets

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.021), (17, 0.017), (19, 0.015), (36, 0.021), (37, 0.049), (43, 0.024), (47, 0.018), (52, 0.087), (59, 0.024), (70, 0.044), (87, 0.5), (88, 0.053), (93, 0.01), (94, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72535968 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

Author: Josh Bongard, Hod Lipson

2 0.22979423 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

3 0.22592279 36 jmlr-2005-Gaussian Processes for Ordinal Regression

Author: Wei Chu, Zoubin Ghahramani

Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative ﬁltering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach. Keywords: Gaussian processes, ordinal regression, approximate Bayesian inference, collaborative ﬁltering, gene expression analysis, feature selection

4 0.22550049 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

Author: Tong Luo, Kurt Kramer, Dmitry B. Goldgof, Lawrence O. Hall, Scott Samson, Andrew Remsen, Thomas Hopkins

5 0.22334151 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints

Author: Aharon Bar-Hillel, Tomer Hertz, Noam Shental, Daphna Weinshall

Abstract: Many learning algorithms use a metric deﬁned over the input space as a principal tool, and their performance critically depends on the quality of this metric. We address the problem of learning metrics using side-information in the form of equivalence constraints. Unlike labels, we demonstrate that this type of side-information can sometimes be automatically obtained without the need of human intervention. We show how such side-information can be used to modify the representation of the data, leading to improved clustering and classiﬁcation. Speciﬁcally, we present the Relevant Component Analysis (RCA) algorithm, which is a simple and efﬁcient algorithm for learning a Mahalanobis metric. We show that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis. If dimensionality reduction is allowed within RCA, we show that it is optimally accomplished by a version of Fisher’s linear discriminant that uses constraints. Moreover, under certain Gaussian assumptions, RCA can be viewed as a Maximum Likelihood estimation of the within class covariance matrix. We conclude with extensive empirical evaluations of RCA, showing its advantage over alternative methods. Keywords: clustering, metric learning, dimensionality reduction, equivalence constraints, side information.

6 0.21869451 71 jmlr-2005-Variational Message Passing

7 0.21855266 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

8 0.21652374 24 jmlr-2005-Core Vector Machines: Fast SVM Training on Very Large Data Sets

9 0.21572845 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

10 0.21529034 12 jmlr-2005-An MDP-Based Recommender System

11 0.21231905 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

12 0.21049406 64 jmlr-2005-Semigroup Kernels on Measures

13 0.21017715 3 jmlr-2005-A Classification Framework for Anomaly Detection

14 0.21004064 39 jmlr-2005-Information Bottleneck for Gaussian Variables

15 0.20942618 45 jmlr-2005-Learning Multiple Tasks with Kernel Methods

16 0.20914008 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching

17 0.20871569 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

18 0.20844628 44 jmlr-2005-Learning Module Networks

19 0.20790514 69 jmlr-2005-Tutorial on Practical Prediction Theory for Classification

20 0.20662344 54 jmlr-2005-Managing Diversity in Regression Ensembles