jmlr jmlr2008 jmlr2008-47 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Leonor Becerra-Bonache, Colin de la Higuera, Jean-Christophe Janodet, Frédéric Tantini
Abstract: When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves rely either on too simple information (text) or on unattainable one (query systems that do not exist in practice, nor can be simulated). We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. We propose to learn them with the help of a new sort of queries, called the correction queries: when a string is submitted to the Oracle, either she accepts it if it belongs to the target language, or she proposes a correction, that is, a string of the language close to the query with respect to the edit distance. We show that even if the good balls are not learnable in Angluin’s M AT model, they can be learned from a polynomial number of correction queries. Moreover, experimental evidence simulating a human Expert shows that this algorithm is resistant to approximate answers. Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings
Reference: text
sentIndex sentText sentNum sentScore
1 We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. [sent-10, score-0.629]
2 We show that even if the good balls are not learnable in Angluin’s M AT model, they can be learned from a polynomial number of correction queries. [sent-12, score-0.676]
3 Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings 1. [sent-14, score-1.362]
4 The second goal of this paper is to provide evidence that correction queries are suitable for this kind of real-life applications. [sent-45, score-0.637]
5 Balls of strings are formed by choosing one specific string, called the centre, and all its neighbours up to a given length for the edit distance, called the radius. [sent-67, score-0.45]
6 From a practical standpoint, balls of strings appear in a variety of settings: in approximate string matching tasks, the goal is to find all 1842 L EARNING BALLS OF S TRINGS FROM E DIT C ORRECTIONS close matches to some target string (Navarro, 2001; Chávez et al. [sent-68, score-1.181]
7 Hence, in this paper, we study the problem of identifying balls of strings from correction queries. [sent-70, score-0.854]
8 1), which queries should be used to learn languages (2. [sent-72, score-0.491]
9 On one hand, we prove that the balls are not learnable with Angluin’s membership and equivalence queries, and on the other hand, that the deterministic finite automata are not learnable with correction queries. [sent-79, score-0.844]
10 It consists of a polynomial time algorithm that infers any ball from correction queries. [sent-81, score-0.472]
11 An important question is raised concerning the fact that only good balls can be learned with a polynomial number of correction queries (4. [sent-85, score-0.963]
12 Noise over strings can, in the simplest case, just affect the labels: a string in the language will be classified as not belonging, whereas a string can be labeled inside the language when it is not. [sent-105, score-0.949]
13 Indeed, these languages are sparse in the set of all strings, so trying to learn them from noisy data is like looking for a needle in a haystack: no string seems to belong to the target anymore. [sent-114, score-0.491]
14 Therefore, if we are to learn languages in a noisy setting where the noise is modelled through the edit distance, we think that it is necessary to consider other classes of languages that could be much better adapted to this type of noise. [sent-119, score-0.486]
15 The balls of strings are an example of such languages. [sent-120, score-0.551]
16 Several types of queries have been studied, and some classes were proved to be learnable from specific combination of queries (see Angluin, 2004, for a survey). [sent-127, score-0.715]
17 The best known and most important of such positive results is that deterministic finite state automata are polynomially learnable 1844 L EARNING BALLS OF S TRINGS FROM E DIT C ORRECTIONS from a combination of membership queries and strong equivalence queries (Angluin, 1987a). [sent-128, score-0.862]
18 We argue that equivalence queries are not realistic for the intended applications, and we choose to use the recently introduced correction queries instead (Becerra-Bonache and Yokomori, 2004). [sent-131, score-0.998]
19 When making a correction query, we submit a string to the Oracle who answers Y ES if the string is in the target language, and if it is not then the Oracle returns a string from the language that is closest to the query. [sent-132, score-1.332]
20 1845 B ECERRA -B ONACHE , DE LA H IGUERA , JANODET AND TANTINI Hence, it is easy to learn the balls of IR2 with very few correction queries. [sent-149, score-0.63]
21 However, building the centre of a ball from strings on its periphery is difficult for at least two reasons. [sent-151, score-0.461]
22 Therefore, we will have to study the balls in more detail and make the best possible use of the correction queries, so as not to build the centres from scratch. [sent-158, score-0.661]
23 1 Edit Distance The edit distance d(w, w ) between two strings w and w is the minimum number of edit operations needed to transform w into w (Levenshtein, 1965). [sent-211, score-0.693]
24 Definition 3 (Ball of Strings) The ball of centre o ∈ Σ∗ and radius r ∈ IN, denoted Br (o), is the set of all the strings whose distance to o is at most r: Br (o) = {w ∈ Σ∗ : d(o, w) ≤ r}. [sent-232, score-0.566]
25 Experimentally (see Table 1), we clearly notice that for center strings of fixed length, the average number of strings is more than twice larger when the radius is incremented by 1. [sent-235, score-0.563]
26 An alternative representation could be based on automata, since the balls of strings are finite and thus regular languages. [sent-302, score-0.58]
27 The latter knows the target language and answers properly to the queries (she does not lie). [sent-344, score-0.475]
28 Two kinds of queries are used: Definition 6 (Membership and Equivalence Queries) Let Λ be a class of languages on Σ ∗ and L ∈ Λ a target language known by the Oracle, that the Learner aims at guessing. [sent-349, score-0.555]
29 In the case of membership queries, the Learner submits a string w ∈ Σ ∗ to the Oracle; her answer, denoted by M Q(w), is either Y ES if w ∈ L, or N O if w ∈ L. [sent-350, score-0.404]
30 / In the case of equivalence queries, the Learner submits (the representation of) a language K ∈ Λ to the Oracle; her answer, denoted by E Q(K), is either YES if K = L, or a string belonging to the symmetric difference (K \ L) ∪ (L \ K) if K = L. [sent-351, score-0.417]
31 Although membership queries and equivalence queries have established themselves as a standard combination, there are real grounds to believe that equivalence queries are too powerful to 1849 B ECERRA -B ONACHE , DE LA H IGUERA , JANODET AND TANTINI exist or even be simulated. [sent-352, score-1.123]
32 As suggested by Angluin (1987a), in practice, we may be able to substitute the equivalence queries with a random draw of strings that are then submitted as membership queries (sampling). [sent-354, score-1.036]
33 Besides, we will not consider membership queries and equivalence queries together because they do not help to learn balls: Theorem 7 Assume |Σ| ≥ 2. [sent-356, score-0.789]
34 Any algorithm that identifies every ball of B≤m,n with equivalence queries and membership queries necessarily uses Ω(|Σ|n ) queries in the worst case. [sent-358, score-1.239]
35 Proof Following Angluin (1987b), we describe a malevolent Oracle who forces any method of exact identification using membership and equivalence queries to make Ω(|Σ| n ) queries in the worst case. [sent-359, score-0.762]
36 At this point, many balls might disappear, but only one of centre o and radius 0. [sent-366, score-0.428]
37 As there are Ω(|Σ| n ) such balls in B≤m,n , the Learner will need Ω(|Σ|n ) queries to identify one of them. [sent-367, score-0.634]
38 It should be noted that if the Learner is given one string from the ball, then he can learn using a polynomial number of membership queries. [sent-368, score-0.416]
39 2 We shall see that the correction queries, introduced below, allow to get round these problems: Definition 8 (Correction Queries) Let L be a target language known by the Oracle and w a string submitted by the Learner to the Oracle. [sent-369, score-0.74]
40 Her answer, denoted C Q(w), is either Y ES if w ∈ L, or a correction of w with respect to L if w ∈ L, that is a string w ∈ L at minimum edit distance from w: / C Q(w) = one string of w ∈ L : d(w, w ) is minimum . [sent-370, score-1.138]
41 Notice that other milder definitions of correction queries have been proposed in the literature such as Becerra-Bonache et al. [sent-371, score-0.637]
42 However, the correction queries defined above can easily be simulated knowing the target language. [sent-373, score-0.675]
43 Also, we can note that the correction queries are relevant from a cognitive point of view: there is growing evidence that corrective input for grammatical errors is widely available to children (Becerra-Bonache, 2006). [sent-375, score-0.742]
44 And last but not least, the correction queries as well as the balls rely on a distance, that foreshadows nice learning results. [sent-376, score-0.937]
45 Any algorithm that identifies every D FA of D≤n with correction queries necessarily uses Ω(|Σ|n ) queries in the worst case. [sent-381, score-0.971]
46 Each time the correction of any string w is demanded, the Adversary answers Y ES and eliminates Aw from S (and a lot of other D FA) in order to be consistent. [sent-388, score-0.649]
47 Identifying Balls of Strings using Corrections In this section, we propose an algorithm that learns the balls of strings using correction queries. [sent-392, score-0.854]
48 However, several details distinguish the balls of strings and the balls in IR2 . [sent-394, score-0.851]
49 1 A C HARACTERIZATION OF THE C ORRECTIONS When the Learner submits a string outside of a ball to the Oracle, she answers with a string that belongs to the ‘circle’ delimiting the ball. [sent-403, score-0.826]
50 For instance, the possible corrections for the string 0000 with respect to the ball B2 (11) are {00, 001, 010, 100, 0011, 0101, 0110, 1001, 1010, 1100}. [sent-405, score-0.632]
51 If d(u, v) > d(w, v), then w is a string of B r (o) that is closer to v than u, so u cannot be a correction of v. [sent-417, score-0.599]
52 In consequence, all the strings w ∈ W and corrections u ∈ U are at the same distance from v, thus W ⊆ U. [sent-421, score-0.488]
53 Lemma 10 states that a string u is a possible correction of v iff u ∈ [o, v] ∩Cr (o). [sent-427, score-0.599]
54 Consider an arbitrary letter a ∈ Σ and the string r r ar o. [sent-442, score-0.4]
55 We claim that the correction queries r allow us to get hold of ar o from any string w ∈ Bmax (o) by swapping the letters. [sent-454, score-1.0]
56 Running E X 2 TRACT _C ENTRE on the string w = 0110 and radius r = 2 transforms, at each loop, the i th letter of w to a 0 that is put at the beginning and then submits it to the Oracle. [sent-459, score-0.463]
57 If there exists at least one r r derivation o → w where the letter ai of w comes from an insertion in o, then deleting ai and doing the − insertion of a x in front of o yields a string w that satisfies o w and |w | = |w|. [sent-491, score-0.546]
58 4 F INDING ONE B ORDERLINE S TRING OF M AXIMUM L ENGTH Hence, we are now able to deduce the centre of a ball as soon as we know its radius and a string from its upper border. [sent-499, score-0.651]
59 The following lemma states that if one asks to the Oracle the correction of a string made of a lot of 0’s, then this correction contains precious informations about the radius and the number of occurrences of 0’s in the centre: Lemma 15 Consider the ball Br (o). [sent-533, score-1.106]
60 As |w| < |a j |, the computation of d(w, a j ) consists in (1) substituting all the letters of w that are not a’s, thus doing |w| − |w|a substitutions, and (2) completing this string with further a’s in order to reach a j , thus doing j − |w| insertions of a’s. [sent-539, score-0.414]
61 In other words, he is able to guess the balls with correction queries (see Algorithm I DF _BALL and Proposition 16). [sent-578, score-0.96]
62 Proposition 16 Given any fixed ball Br (o), the Algorithm I DF _BALL returns the representation (o, r) using O (|Σ| + |o| + r) correction queries. [sent-592, score-0.446]
63 Concerning the number of queries, the corrections of j all the strings ai require O (|Σ| + log(|o| + r)) correction queries (lines 2-5). [sent-594, score-1.111]
64 Indeed, O (log(|o| + r)) queries are necessary to get long enough corrections, plus one query per letter, thus |Σ| queries. [sent-595, score-0.42]
65 Then E XTRACT _C ENTRE needs O (|o| + r) correction queries (line 8) to find the centre, by Proposition 13. [sent-596, score-0.637]
66 3 Only the Good Balls are Learnable with I DF _BALL We now have an algorithm that guesses all the balls Br (o) with O (|Σ| + |o| + r) correction queries. [sent-599, score-0.603]
67 Therefore, the number of correction queries used by I DF _BALL is exponential in log r. [sent-604, score-0.637]
68 The set of all q()-good balls B r (o) is identifiable with an algorithm that uses O (|Σ| + |o| + q(|o|)) correction queries and a polynomial amount of time. [sent-613, score-0.963]
69 • The set of all very good balls is identifiable with a linear number O (|Σ| + |o|) of correction queries and a polynomial amount of time. [sent-614, score-0.963]
70 On one hand, if the number of correction queries authorized to learn can also depend on the length of the longest correction provided by the Oracle during a run of I DF _BALL, then the answer is positive: all the balls are learnable. [sent-617, score-1.305]
71 (2008), it has been proved that the set of all the balls was not polynomially identifiable in the limit, nor in most relevant online learning paradigms, whereas positive results were established for the good balls in most paradigms. [sent-619, score-0.6]
72 On one hand, asking billions of queries is unacceptable: there is no chance to get enough answers in reasonable time. [sent-624, score-0.412]
73 Some examples of this alternative approach (imperfect Oracle) are: system S QUIRREL, which makes use of queries to a human expert to allow 1856 L EARNING BALLS OF S TRINGS FROM E DIT C ORRECTIONS wrapper induction (Carme et al. [sent-629, score-0.402]
74 , 2002), where the actual electronic device can be physically tested by entering a sequence, and the device will then be able to answer a membership query (note that in that setting equivalence queries will be usually simulated by sampling). [sent-632, score-0.547]
75 Secondly, what is really hard for the expert is to compute the best correction of w when w ∈ Br (o), and more precisely, a string of the ball that is as close to w as possible. [sent-653, score-0.779]
76 Therefore, with probability (1 − p)k p, the correction C Q h (w) of a string w will be in the target ball, at distance k of C Q(w). [sent-660, score-0.681]
77 For instance, we do not suppose that she has any memory, thus by submitting twice every string w, we would probably get 2 different corrections that could be used to correct the corrections! [sent-669, score-0.517]
78 We also show, in Figure 5, the average distances between the centres of the target balls and the centres of the the learnt balls when he fails to retrieve them. [sent-693, score-0.795]
79 9 1 accuracy Figure 5: Average distances (and standard deviation) between the centres of the target balls and the centres of the learnt balls, when I DF _BALL fails in retrieving them. [sent-713, score-0.517]
80 9 1 accuracy Figure 6: Average difference (and standard deviation) between the radii of the target balls and the radii of the learnt balls, when I DF _BALL fails in retrieving them. [sent-722, score-0.453]
81 Then we keep the representations (u , k ) of the smallest balls that contain all the corrections seen so far. [sent-725, score-0.493]
82 From this set, we select all the pairs (u , k ) that maximize the number of corrections (of Q ) at distance k , in such a way as to get the maximum number of corrections on the border of the new ball. [sent-727, score-0.458]
83 In order to show that the balls learnt by I DF _BALL can be corrected a posteriori, we compare, in a series of experiments, the precision of the algorithm without any post-treatment, with the onestep heuristics and with the until-convergence heuristics. [sent-732, score-0.416]
84 2 Using Less Correction Queries We have seen that the good balls were identifiable with O (|Σ| + |o| + r) correction queries. [sent-753, score-0.603]
85 Clearly, if the weight of all the edit operations is 1, then we get the standard edit distance. [sent-793, score-0.451]
86 As we aim at showing that the number of correction queries can be dramatically reduced, we assume that: 1. [sent-795, score-0.637]
87 Actually, we shall not consider balls whose radius is not an integer, because otherwise, the balls B r (o) and Br+ ρ (o) might represent the same set. [sent-835, score-0.683]
88 In particular, Lemma 10 was stating that the set of possible corrections of any string v ∈ B r (o) was exactly {u ∈ Σ∗ : d(o, u) = r and d(u, v) = d(o, v) − r}. [sent-839, score-0.489]
89 Then any correction u of the string 100 is in {000, 101, 110}. [sent-842, score-0.599]
90 Indeed, we are going to show as in Lemma 15, that if one asks for the correction w of a string made of a lot of 0’s, then |w|0 = |o|0 + r. [sent-856, score-0.599]
91 As − |o| > r, the string a −|o| o is not in −→ the ball, so this derivation passes through the string ar o before reaching a −|o| o. [sent-882, score-0.66]
92 4 L EARNING THE BALLS L OGARITHMICALLY As a consequence of Lemma 21, the correction of a long string of 0’s leads to a string of B max (o). [sent-897, score-0.895]
93 If the corrections of the strings 0 , 1 and 2 (with big enough) are v0 = 103020, v1 = 101302 and v2 = 103202 respectively, then E0 (v0 ) = E0 (o) = 132, E1 (v1 ) = E1 (o) = 0302 and E2 (v2 ) = E2 (o) = 1030. [sent-913, score-0.444]
94 Furthermore, we can easily deduce o by aligning the strings E 0 (o) and E1 (o) and E2 (o): E0 (o) 1 · 3 · 2 E1 (o) · 0 3 0 2 E2 (o) 1 0 3 0 · o 1 0 3 0 2 This procedure does not use any new correction query and runs in time O (|o|) which is clearly more efficient than E XTRACT _C ENTRE. [sent-914, score-0.673]
95 The set of all q()-good balls B r (o) is identifiable with an algorithm that uses O (log(|o| + q(|o|))) correction queries and a polynomial amount of time. [sent-922, score-0.963]
96 • The set of all very good balls is identifiable with a logarithmic number O (log |o|) of correction queries and a polynomial amount of time. [sent-923, score-0.963]
97 Discussion and Conclusion In this work, we have used correction queries to learn a particular class of languages from an Oracle. [sent-928, score-0.794]
98 In order to do this, the languages we consider are good balls of strings defined with the edit distance. [sent-930, score-0.88]
99 Just asking for persistent corrections is not enough to solve this problem: a good model should require that if one queries from a close enough string (a 999 instead of a1000 ) then the corrections should also remain close. [sent-939, score-1.016]
100 Learning languages from bounded resources: The case of the DFA and the balls of strings. [sent-1083, score-0.43]
wordName wordTfidf (topN-words)
[('queries', 0.334), ('correction', 0.303), ('balls', 0.3), ('string', 0.296), ('strings', 0.251), ('bmax', 0.207), ('oracle', 0.204), ('edit', 0.199), ('corrections', 0.193), ('br', 0.184), ('ball', 0.143), ('janodet', 0.131), ('tantini', 0.131), ('languages', 0.13), ('df', 0.121), ('angluin', 0.117), ('trings', 0.117), ('orrections', 0.11), ('grammatical', 0.105), ('la', 0.104), ('dit', 0.103), ('ecerra', 0.103), ('iguera', 0.103), ('onache', 0.103), ('insertions', 0.093), ('higuera', 0.083), ('entre', 0.069), ('xtract', 0.069), ('centre', 0.067), ('membership', 0.067), ('letter', 0.065), ('fa', 0.062), ('deduce', 0.061), ('radius', 0.061), ('centres', 0.058), ('query', 0.058), ('learner', 0.055), ('proposition', 0.055), ('automata', 0.053), ('language', 0.053), ('answers', 0.05), ('icgi', 0.048), ('insertion', 0.048), ('learnable', 0.047), ('deletions', 0.047), ('substitutions', 0.045), ('distance', 0.044), ('submits', 0.041), ('colloquium', 0.041), ('learnt', 0.041), ('levenshtein', 0.041), ('ar', 0.039), ('alphabet', 0.039), ('precision', 0.039), ('target', 0.038), ('answer', 0.038), ('expert', 0.037), ('heuristics', 0.036), ('basically', 0.036), ('lnai', 0.035), ('fallible', 0.034), ('irrational', 0.034), ('earning', 0.034), ('human', 0.031), ('ai', 0.03), ('ea', 0.029), ('derivation', 0.029), ('disk', 0.029), ('regular', 0.029), ('get', 0.028), ('amengual', 0.028), ('chomsky', 0.028), ('compass', 0.028), ('schulz', 0.028), ('equivalence', 0.027), ('learn', 0.027), ('substitution', 0.027), ('radii', 0.026), ('polynomial', 0.026), ('letters', 0.025), ('weight', 0.025), ('deletion', 0.024), ('wagner', 0.023), ('disks', 0.023), ('able', 0.023), ('faced', 0.022), ('shall', 0.022), ('accuracy', 0.022), ('standpoint', 0.021), ('carme', 0.021), ('crochemore', 0.021), ('dfa', 0.021), ('hagerer', 0.021), ('leonor', 0.021), ('mihov', 0.021), ('orderline', 0.021), ('recognizes', 0.021), ('ruler', 0.021), ('uav', 0.021), ('vez', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
Author: Leonor Becerra-Bonache, Colin de la Higuera, Jean-Christophe Janodet, Frédéric Tantini
Abstract: When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves rely either on too simple information (text) or on unattainable one (query systems that do not exist in practice, nor can be simulated). We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. We propose to learn them with the help of a new sort of queries, called the correction queries: when a string is submitted to the Oracle, either she accepts it if it belongs to the target language, or she proposes a correction, that is, a string of the language close to the query with respect to the edit distance. We show that even if the good balls are not learnable in Angluin’s M AT model, they can be learned from a polynomial number of correction queries. Moreover, experimental evidence simulating a human Expert shows that this algorithm is resistant to approximate answers. Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings
2 0.082051151 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
Author: Konrad Rieck, Pavel Laskov
Abstract: Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms—enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA. Keywords: string kernels, string distances, learning with sequential data
3 0.062198464 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
Author: Andreas Maurer
Abstract: A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classifiers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition. Keywords: learning similarity, similarity, transfer learning
4 0.041787732 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
Author: Hannes Nickisch, Carl Edward Rasmussen
Abstract: We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches. Keywords: Gaussian process priors, probabilistic classification, Laplaces’s approximation, expectation propagation, variational bounding, mean field methods, marginal likelihood evidence, MCMC
5 0.04057955 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
6 0.036610398 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
7 0.033341628 56 jmlr-2008-Magic Moments for Structured Output Prediction
8 0.031643257 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
9 0.025306448 9 jmlr-2008-Active Learning by Spherical Subdivision
10 0.023578655 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace
11 0.022090143 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
12 0.021233631 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
13 0.021086914 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
14 0.02005684 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
15 0.020029971 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
16 0.019912336 52 jmlr-2008-Learning from Multiple Sources
17 0.019324915 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
18 0.019048588 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
19 0.017564787 54 jmlr-2008-Learning to Select Features using their Properties
20 0.016744204 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming
topicId topicWeight
[(0, 0.111), (1, -0.006), (2, -0.018), (3, -0.036), (4, -0.036), (5, -0.014), (6, 0.041), (7, -0.005), (8, 0.025), (9, 0.028), (10, -0.049), (11, 0.122), (12, -0.001), (13, -0.012), (14, 0.078), (15, 0.002), (16, -0.095), (17, -0.002), (18, -0.177), (19, -0.281), (20, -0.11), (21, 0.053), (22, -0.14), (23, -0.081), (24, -0.142), (25, -0.118), (26, -0.037), (27, 0.166), (28, 0.085), (29, -0.156), (30, -0.159), (31, -0.036), (32, 0.229), (33, 0.353), (34, -0.146), (35, -0.079), (36, -0.139), (37, 0.181), (38, 0.128), (39, -0.028), (40, 0.118), (41, -0.094), (42, -0.077), (43, -0.21), (44, 0.076), (45, -0.283), (46, 0.018), (47, 0.049), (48, -0.069), (49, -0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.98265833 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
Author: Leonor Becerra-Bonache, Colin de la Higuera, Jean-Christophe Janodet, Frédéric Tantini
Abstract: When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves rely either on too simple information (text) or on unattainable one (query systems that do not exist in practice, nor can be simulated). We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. We propose to learn them with the help of a new sort of queries, called the correction queries: when a string is submitted to the Oracle, either she accepts it if it belongs to the target language, or she proposes a correction, that is, a string of the language close to the query with respect to the edit distance. We show that even if the good balls are not learnable in Angluin’s M AT model, they can be learned from a polynomial number of correction queries. Moreover, experimental evidence simulating a human Expert shows that this algorithm is resistant to approximate answers. Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings
2 0.41202381 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
Author: Konrad Rieck, Pavel Laskov
Abstract: Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms—enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA. Keywords: string kernels, string distances, learning with sequential data
3 0.22210358 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
Author: Andreas Maurer
Abstract: A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classifiers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition. Keywords: learning similarity, similarity, transfer learning
4 0.1909858 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks
5 0.16728161 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
Author: Sungwook Yoon, Alan Fern, Robert Givan
Abstract: A number of today’s state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to find domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-specific knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that it is challenging to specify a knowledge representation for compactly representing important concepts across a wide range of domains. One of the main contributions of this work is to introduce a novel feature space for representing such control knowledge. The key idea is to define features in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners. This gives a new way of leveraging relaxed planning techniques in the context of learning. Using this feature space, we describe three forms of control knowledge—reactive policies (decision list rules and measures of progress) and linear heuristics—and show how to learn them and incorporate them into forward state-space search. Our empirical results show that our approaches are able to surpass state-of-the-art nonlearning planners across a wide range of planning competition domains. Keywords: planning, machine learning, knowledge representation, search
6 0.13895819 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
7 0.1382637 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace
8 0.1243682 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
9 0.12405418 9 jmlr-2008-Active Learning by Spherical Subdivision
10 0.11245304 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
11 0.1116621 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses
12 0.10474323 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
13 0.091065876 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
14 0.087611102 25 jmlr-2008-Consistency of Random Forests and Other Averaging Classifiers
15 0.085680217 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
16 0.085138306 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
17 0.082346767 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
18 0.081265397 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
19 0.079481065 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
20 0.076153226 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
topicId topicWeight
[(0, 0.021), (5, 0.021), (40, 0.037), (54, 0.031), (57, 0.518), (58, 0.026), (63, 0.013), (66, 0.037), (76, 0.017), (88, 0.068), (92, 0.028), (94, 0.032), (99, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.84673005 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
Author: Leonor Becerra-Bonache, Colin de la Higuera, Jean-Christophe Janodet, Frédéric Tantini
Abstract: When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves rely either on too simple information (text) or on unattainable one (query systems that do not exist in practice, nor can be simulated). We consider simple but sound classes of languages defined via the widely used edit distance: the balls of strings. We propose to learn them with the help of a new sort of queries, called the correction queries: when a string is submitted to the Oracle, either she accepts it if it belongs to the target language, or she proposes a correction, that is, a string of the language close to the query with respect to the edit distance. We show that even if the good balls are not learnable in Angluin’s M AT model, they can be learned from a polynomial number of correction queries. Moreover, experimental evidence simulating a human Expert shows that this algorithm is resistant to approximate answers. Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings
2 0.1945474 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
Author: Hsuan-Tien Lin, Ling Li
Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies infinitely many decision trees, and can thus be explained through infinite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the benefit of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel
3 0.19206379 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
4 0.19057132 57 jmlr-2008-Manifold Learning: The Price of Normalization
Author: Yair Goldberg, Alon Zakai, Dan Kushnir, Ya'acov Ritov
Abstract: We analyze the performance of a class of manifold-learning algorithms that find their output by minimizing a quadratic form under some normalization constraints. This class consists of Locally Linear Embedding (LLE), Laplacian Eigenmap, Local Tangent Space Alignment (LTSA), Hessian Eigenmaps (HLLE), and Diffusion maps. We present and prove conditions on the manifold that are necessary for the success of the algorithms. Both the finite sample case and the limit case are analyzed. We show that there are simple manifolds in which the necessary conditions are violated, and hence the algorithms cannot recover the underlying manifolds. Finally, we present numerical results that demonstrate our claims. Keywords: dimensionality reduction, manifold learning, Laplacian eigenmap, diffusion maps, locally linear embedding, local tangent space alignment, Hessian eigenmap
5 0.19016016 58 jmlr-2008-Max-margin Classification of Data with Absent Features
Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller
Abstract: We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efficiently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. Keywords: max margin, missing features, network reconstruction, metabolic pa
6 0.18880245 96 jmlr-2008-Visualizing Data using t-SNE
7 0.18790893 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
8 0.18597178 9 jmlr-2008-Active Learning by Spherical Subdivision
9 0.18583886 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
10 0.18512267 83 jmlr-2008-Robust Submodular Observation Selection
11 0.18494962 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
13 0.18453494 86 jmlr-2008-SimpleMKL
14 0.18449196 56 jmlr-2008-Magic Moments for Structured Output Prediction
15 0.18400417 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
16 0.18374893 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)
17 0.18204409 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
18 0.18151948 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss
19 0.1811884 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
20 0.18056239 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks