jmlr jmlr2007 jmlr2007-67 knowledge-graph by maker-knowledge-mining

67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Source: pdf

Author: Alexander Clark, Rémi Eyraud

Abstract: This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free languages. We show that there is a polynomial characteristic set, and thus prove polynomial identiﬁcation in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. It transpires that it is not necessary to identify constituents in order to learn a context-free language—it is sufﬁcient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a context-free grammar. We also discuss modiﬁcations to the algorithm that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin’s notion of reversibility for regular languages. We also demonstrate that an implementation of this algorithm is capable of learning a classic example of structure dependent syntax in English: this constitutes a refutation of an argument that has been used in support of nativist theories of language. Keywords: grammatical inference, context-free languages, positive data only, reduction system, natural languages

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. [sent-9, score-0.481]

2 It transpires that it is not necessary to identify constituents in order to learn a context-free language—it is sufﬁcient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a context-free grammar. [sent-10, score-0.447]

3 Keywords: grammatical inference, context-free languages, positive data only, reduction system, natural languages 1. [sent-14, score-0.517]

4 In this paper we present an explicit mathematical formalization of this idea of substitutability and use it to deﬁne a subclass of context-free languages that we call the substitutable languages, that can be learned according to the polynomial identiﬁcation c 2007 Alexander Clark and R´ mi Eyraud. [sent-19, score-0.931]

5 These languages are not comparable to the very simple languages, but seem better suited to be the basis for algorithms that can learn natural languages. [sent-21, score-0.422]

6 Indeed, it is sufﬁcient to be able to learn the syntactic congruence: the syntactic monoid can be converted into a context-free grammar in Chomsky normal form in a very natural way. [sent-26, score-0.771]

7 Secondly, we deﬁne a simple, purely language theoretic criterion, which allows the syntactic congruence to be identiﬁed very easily. [sent-27, score-0.642]

8 The key to the Harris approach for learning a language L, is to look at pairs of strings u and v and to see whether they occur in the same contexts; that is to say, to look for pairs of strings of the form lur and lvr that are both in L. [sent-32, score-0.81]

9 Therefore to obtain learnability results we must deﬁne subclasses of the languages that sufﬁciently restrict the class so that learning can take place. [sent-40, score-0.424]

10 Our main formal result is that this simple, but powerful constraint on languages—and note that it is expressed in purely language theoretic terms—sufﬁciently restricts the class of context-free languages to the extent that it can be learned using a simple polynomial algorithm. [sent-43, score-0.697]

11 More generally, this work shows how one can move from the syntactic congruence of a contextfree language to a grammar for that language under certain assumptions. [sent-45, score-1.195]

12 The set of all substrings of a language L is denoted Sub(L) = {u ∈ Σ+ : ∃l, r ∈ Σ∗ such that lur ∈ L} (notice that the empty word does not belong to Sub(L)). [sent-65, score-0.537]

13 Deﬁnition 1 (Grammar) A grammar is a quadruple G = Σ,V, P, S where Σ is a ﬁnite alphabet of terminal symbols, V is a (distinct) ﬁnite alphabet of variables or non-terminals, P is a ﬁnite set of production rules, and S ∈ V is a start symbol. [sent-69, score-0.657]

14 If P ⊆ V × (Σ ∪ V )+ then the grammar is said to be context-free (CF), and we will write the productions as T → α. [sent-70, score-0.471]

15 a language L, written u ≡L v, if and only if ∀l, r ∈ Σ∗ lur ∈ L iff lvr ∈ L. [sent-79, score-0.492]

16 We can think of this syntactic congruence as the strong notion of substitutability. [sent-80, score-0.389]

17 Note two things: ﬁrst this is clearly an equivalence relation, and secondly, it is a congruence of the monoid Σ∗ , that is, u ≡L v implies ∀l, r lur ≡L lvr. [sent-81, score-0.494]

18 The syntactic monoid of the language L is just the quotient of Σ ∗ by this relation. [sent-82, score-0.476]

19 We can see that u ≡L v iff (|u|a − |u|b ) = (|v|a − |v|b ); the congruence classes will thus correspond to particular values of (|u|a − |u|b ) and the syntactic monoid will be isomorphic to Z. [sent-86, score-0.512]

20 Another way of looking at this relation is to deﬁne the set of contexts of a string: 1727 C LARK AND E YRAUD Deﬁnition 3 (Set of contexts) The set of contexts of a string u in a language L is written CL (u) and deﬁned as CL (u) = {(l, r) : lur ∈ L}. [sent-87, score-0.63]

21 We now can deﬁne the class of languages that we are concerned with: Deﬁnition 5 (Substitutable language) A language L is substitutable if and only if for every pair of . [sent-99, score-0.966]

22 In terms of contexts we can say that a language is substitutable, if whenever the sets of contexts of two strings have non-empty intersection, they are identical. [sent-101, score-0.502]

23 The substitutable context-free languages are just those languages that are both substitutable and context-free. [sent-102, score-1.476]

24 Lemma 6 There are languages which are substitutable but not context-free. [sent-104, score-0.738]

25 The language L = {u ∈ Σ ∗ : |u|a = |u|b ∧ |u|c = |u|d } is substitutable, as can easily be veriﬁed: the syntactic monoid here is isomorphic to Z × Z. [sent-106, score-0.476]

26 The intersection of L with the regular language {a∗ c∗ b∗ d ∗ } gives the language {an cm bn d m : n, m > 0} which is not context-free; therefore L is not context-free. [sent-107, score-0.524]

27 Example 2 A context-free grammar Gn with the one letter alphabet {a} is deﬁned as follows: it contains a set of non-terminals N1 , . [sent-127, score-0.451]

28 Clearly, relaxing the constraint for polynomial size of the characteristic set, to merely requiring polynomial cardinality, is unsatisfactory, since it would allow algorithms to use exponential amounts of computation, by specifying an exponentially long string in the characteristic set. [sent-143, score-0.433]

29 Several ideas have been formulated to tackle this problem, such as to focus on shallow languages (Adriaans, 2002) or to require a characteristic sample that is polynomial in a parameter other than the size of the target (Wakatsuki and Tomita, 1993), but none of them represents a consensus. [sent-144, score-0.56]

30 Algorithm We now deﬁne an algorithm SGL (Substitution Graph Learner), that will learn a context-free grammar from a sample of positive strings of a language. [sent-147, score-0.567]

31 If the language is substitutable, then every member of the same component will be syntactically congruent, and can thus be freely swapped with each other without altering language membership. [sent-156, score-0.499]

32 First, note that since syntactic congruence is transitive, and we are interested in substitutable languages, we can compute the transitive closure of the graph, by adding any edges (u, w) when . [sent-158, score-0.808]

33 If S is a subset of a = substitutable language L then u ∼S v implies u ≡L v. [sent-161, score-0.57]

34 Recall that we write [u]L for the congruence class of u with respect to the syntactic congruence of L. [sent-164, score-0.631]

35 1 Constructing the Grammar Algorithm 1: Algorithm to generate a grammar from a substitution graph. [sent-166, score-0.458]

36 Data: A substitution graph SG = (V, E) ˆ Result: A context-free grammar G Let Σ be the set of all the letters used in the nodes of SG ; ˆ Compute V the set of components of SG ; ˆ Store map V → V deﬁned by u → u ; ˆ Let S be the unique element of V corresponding to the context (λ, λ). [sent-167, score-0.508]

37 ; ˆ = {} ; P for u ∈ V do if |u| > 1 then for v, w such that u = vw do ˆ ˆ P ← P∪( u → v w ) ; end else ˆ ˆ P ← P ∪ ( u → u) end end ˆ ˆ ˆ ˆ output G = Σ, V , S, P ; ˆ ˆ ˆ ˆ Given the SG we now construct a grammar G = Σ, V , P, S . [sent-168, score-0.401]

38 For every node in the substitution graph u, if |u| > 1, for every pair of non-empty strings v, w such that u = vw, we add a production u → v w . [sent-179, score-0.412]

39 Given some substrings such that [u] = [v][w], we can consider this as saying, that any element of the congruence class [v] concatenated with any element of the congruence class [w] will give an element of the congruence class [u]. [sent-184, score-0.828]

40 Phrased like this, it is clear that this can be directly considered as a production rule: if we wish to generate an element of [u] one way of doing it is to generate a string from [v] and then a string from [w]. [sent-185, score-0.378]

41 Algorithm 1 deﬁnes the process of generating a grammar from a substitution graph. [sent-187, score-0.458]

42 G = Grammar generating the empty language ; while true do read next string sn ; if sn ∈ L (G) then set SG to be the substitution graph generated from {s1 , . [sent-195, score-0.615]

43 sn }; set G to be the grammar generated from SG; end output G; end 3. [sent-198, score-0.377]

44 The second one allows a discussion about the induced grammar for a particular language. [sent-201, score-0.35]

45 The grammar will thus have productions aa → a a which is S → SS and ˆ ˆ ˆ ˆˆ ˆ ˆ a → a which is S → a. [sent-207, score-0.503]

46 Thus the learned grammar will be {a}, {S}, {S → SS, S → a}, S which +. [sent-208, score-0.35]

47 generates the language a 1731 C LARK AND E YRAUD Example 4 Consider the language L = {an cbn : n ≥ 0} that can be represented, for instance, by the set of productions {S → aSb, S → c}. [sent-209, score-0.607]

48 The grammar generated from this sample will then have rules of the form C j+k → C j Bk for all k > 0, j ∈ Z , C j−k → AkC j for all k > 0, j ∈ Z, C0 → c, Ai+ j → Ai A j for all i, j > 0, A1 → a, Bi+ j → Bi B j for all i, j > 0, B1 → b. [sent-213, score-0.371]

49 This grammar deﬁnes the language L, but as can be seen the set of non-terminals can be substantially larger than that of the original grammar. [sent-214, score-0.578]

50 cost of computing, for a given pair of strings u,v, all of the substrings u , v such that u =S v can be done in time less than L2 , and thus assuming a constant time map from substrings to nodes in the graph, we can compute all the edges in the graph in time less than L 2 n2 . [sent-225, score-0.424]

51 Proof Our main result is as follows: Theorem 9 SGL polynomially identiﬁes in the limit the class of substitutable (context-free) languages. [sent-231, score-0.4]

52 First, we shall show that for any sample, the language deﬁned by the grammar inferred from the sample will be a subset of the target language; another way of saying this is that the learner will never make a false positive error. [sent-233, score-0.641]

53 1 Proof that Pypothesis is Not Too Large First of all we shall show that, for any ﬁnite sample of a substitutable language, the grammar derived from a ﬁnite sample does not deﬁne a language that is too large. [sent-238, score-1.004]

54 If |w| = 1, then by the construction of the grammar there is a unary rule w → w. [sent-242, score-0.378]

55 Therefore the generated grammar will always include the training sample. [sent-246, score-0.35]

56 ˆ Since we have a production u → v w in P, there must be strings v , w such that v w is a string in the same component as u, and v ≡L v and w ≡L w and u ≡L v w . [sent-258, score-0.401]

57 ˆ Theorem 12 For any positive sample of a substitutable context-free language L, if G is the result ˆ ⊆ L. [sent-260, score-0.591]

58 2 Proof that Hypothesis is Large Enough To prove that the hypothesis is large enough, we ﬁrst need to deﬁne a characteristic set, that is to say a subset of a target language L∗ which will ensure that the algorithm will output a grammar G such that L (G) = L∗ . [sent-265, score-0.673]

59 2 C ONVERGENCE We now must show that for any substitutable context-free grammar G, if the sample C contains the ˆ characteristic set CS(G), that is, CS(G) ⊆ C ⊆ L (G), and if G is the output SGL produces on the ˆ sample C, then L (G) = L (G). [sent-287, score-0.829]

60 ˆ ˆ Theorem 15 For any substitutable CFG G, when the sample contains the characteristic set, L ⊆ ˆ L (G). [sent-340, score-0.458]

61 In this section we will consider some algorithms for doing this; the languages deﬁned will be unchanged, but the algorithms will be more efﬁcient. [sent-347, score-0.396]

62 Secondly, one can remove non-terminals, one by one, and test whether the grammar continues to accept all of the sample, and thus arrive at a minimal CFG. [sent-350, score-0.35]

63 In particular if we have one component that contains the strings u and v, where v u, and another that contains the strings lur and lvr, we can reduce the graph by removing the string lur. [sent-355, score-0.688]

64 1735 C LARK AND E YRAUD Deﬁnition 16 (Reduction system) A reduction system T , over an alphabet Σ is a ﬁnite set of pairs of strings T ⊂ Σ∗ × Σ∗ , where each pair (u, v) is normally written u T v, is called a reduction rule and satisﬁes v u. [sent-360, score-0.365]

65 Being conﬂuent and Noetherian means that there is a simple algorithm to determine this congruence: each string belong to only one congruence class. [sent-367, score-0.389]

66 Given a reduction system one can deﬁne a language as the union of ﬁnitely many congruence classes. [sent-370, score-0.528]

67 Thus given a reduction system T and a set of irreducible strings A on an alphabet Σ, we can deﬁne a language L(T, A) = {v ∈ Σ∗ : ∃a ∈ A ∧ v ∗ a}. [sent-371, score-0.557]

68 The language deﬁned by L(T, {c}) is exactly the palindrome language over a, b with center marker c. [sent-377, score-0.482]

69 Assuming that we have a set of examples generated from a substitutable CFG that contains the characteristic set, it is easy to prove the following lemmas. [sent-385, score-0.437]

70 Subject to this caveat, we can deﬁne a small reduction system that represents the same language, namely the set of all strings that reduces to the least string w(S) (w. [sent-408, score-0.375]

71 ) in the language, but without the large number of redundant rules that the simple grammar construction produces. [sent-411, score-0.378]

72 Substitutable Languages This section contains some examples of substitutable CFLs, as well as some simple CFLs that are not substitutable, and a discussion on the relationship of this class of languages to other standard classes. [sent-414, score-0.738]

73 This is without a doubt a restricted class of languages but contains some interesting examples. [sent-415, score-0.396]

74 Since we are learning under a Gold style paradigm, we cannot hope to learn all ﬁnite languages (Gold, 1967). [sent-417, score-0.422]

75 Indeed, the more complex the languages we hope to learn, the smaller the set of ﬁnite languages we will be able to learn. [sent-418, score-0.792]

76 1 Examples We will now give some examples of some simple languages that are substitutable as well as some simple languages that are not. [sent-420, score-1.134]

77 Given an alphabet Σ, the following languages are substitutable: • The set Σ∗ of all the strings on the alphabet is substitutable. [sent-423, score-0.768]

78 • Any language consisting of only one string (namely a singleton language) is substitutable, . [sent-425, score-0.375]

79 1737 C LARK AND E YRAUD • The languages {an : n > 0}, for all a ∈ Σ, are substitutable. [sent-427, score-0.396]

80 We now turn our attention to simple languages that are not substitutable. [sent-429, score-0.396]

81 2 Relation to Other Language Classes The state of the art concerning polynomial identiﬁcation of context-free languages from positive example only is quite limited. [sent-441, score-0.444]

82 The fact that substitutable context-free languages can be represented by reduction systems proves that they are properly included within the class of congruential languages (Book and Otto, 1993). [sent-443, score-1.17]

83 The examples previously presented show that they are incomparable with the classes of ﬁnite languages and regular languages. [sent-444, score-0.439]

84 The most important subclass of context-free languages that is polynomially identiﬁable is that of very simple grammars (Yokomori, 2003). [sent-445, score-0.659]

85 Some very simple grammars are not substitutable: an example is the regular grammar with productions S → aNP, S → bN, N → n, P → rP, P → p. [sent-447, score-0.685]

86 We conjecture that all substitutable context free languages are NTS languages. [sent-457, score-0.738]

87 This is put forward as strong evidence in favour of innately speciﬁed language speciﬁc knowledge: we shall refer to this view as linguistic nativism. [sent-479, score-0.378]

88 The learned grammar also correctly generated the correct form and did not generate the ﬁnal form. [sent-508, score-0.35]

89 We can see that in the learned grammar “the man” will be congruent to “the man who is hungry”, since there is a pair of sentences which differ only by this. [sent-510, score-0.635]

90 One of the derivations for this sentence would be: [is the man hungry ? [sent-514, score-0.371]

91 The rule uv → u v appears at ﬁrst sight vacuous, yet it is sufﬁcient to deﬁne correct languages for an interesting class of grammars. [sent-564, score-0.396]

92 Yokomori (2003) shows that the class of very simple languages can be polynomially identiﬁed in the limit. [sent-568, score-0.454]

93 Recall that a regular language is reversible if whenever uw and vw are in the language then ux is in the language if and only if vx is in the language. [sent-576, score-0.821]

94 Nonetheless, following the work on reversible regular languages, 1742 S UBSTITUTABLE L ANGUAGES one can think of a deﬁnition of k-substitutable languages and may obtain positive learning results on that hierarchy of languages. [sent-580, score-0.482]

95 We can also compare the substitutability to µ-distinguishability for inference of regular languages (Ron et al. [sent-581, score-0.581]

96 There are some technical problems to be overcome, since the number of syntactic congruence classes will be inﬁnite for non regular languages, and thus the distinguishability will not in general be bounded from below. [sent-587, score-0.432]

97 Algorithms for learning regular languages focus on identifying the states of a deterministic automaton. [sent-589, score-0.439]

98 From a learnability point of view this is interesting because it is purely syntactic—it is not semantic as it does not depend on the representation of the language but only on the language itself. [sent-596, score-0.481]

99 One of the weaknesses in the work is the fact that we do not yet have a grammatical characterisation of substitutability, nor an algorithm for determining whether a grammar deﬁnes a substitutable language. [sent-600, score-0.777]

100 Substitutable languages have a property that allows a trivial procedure for determining when two substrings are congruent, but it is easy to think of much more robust methods of determining this that rely on more complex properties of the context distributions. [sent-603, score-0.498]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('languages', 0.396), ('grammar', 0.35), ('substitutable', 0.342), ('congruence', 0.242), ('language', 0.228), ('hungry', 0.181), ('grammars', 0.171), ('strings', 0.17), ('lur', 0.151), ('string', 0.147), ('syntactic', 0.147), ('productions', 0.121), ('man', 0.119), ('substitutability', 0.111), ('substitution', 0.108), ('sg', 0.105), ('substrings', 0.102), ('alphabet', 0.101), ('anguages', 0.101), ('congruent', 0.101), ('lark', 0.101), ('monoid', 0.101), ('ubstitutable', 0.101), ('yraud', 0.101), ('characteristic', 0.095), ('lvr', 0.091), ('sgl', 0.091), ('grammatical', 0.085), ('production', 0.084), ('substring', 0.071), ('linguistic', 0.068), ('sentences', 0.065), ('dinner', 0.06), ('nts', 0.06), ('uent', 0.06), ('polynomially', 0.058), ('auxiliary', 0.054), ('transitive', 0.053), ('contexts', 0.052), ('vw', 0.051), ('chomsky', 0.05), ('higuera', 0.05), ('reversibility', 0.05), ('graph', 0.05), ('polynomial', 0.048), ('sentence', 0.045), ('syntactically', 0.043), ('reversible', 0.043), ('regular', 0.043), ('sub', 0.043), ('shall', 0.042), ('adriaans', 0.04), ('asb', 0.04), ('favour', 0.04), ('nativist', 0.04), ('harris', 0.039), ('reduction', 0.036), ('gold', 0.035), ('subclass', 0.034), ('vi', 0.034), ('aa', 0.032), ('english', 0.031), ('inference', 0.031), ('cbn', 0.03), ('eating', 0.03), ('fronting', 0.03), ('iil', 0.03), ('nizergues', 0.03), ('noetherian', 0.03), ('pullum', 0.03), ('construction', 0.028), ('word', 0.028), ('empty', 0.028), ('subclasses', 0.028), ('sn', 0.027), ('learn', 0.026), ('reductions', 0.026), ('clause', 0.026), ('derivations', 0.026), ('constituents', 0.026), ('marker', 0.026), ('polar', 0.026), ('bn', 0.025), ('purely', 0.025), ('identi', 0.024), ('closure', 0.024), ('la', 0.023), ('cl', 0.023), ('exposed', 0.023), ('length', 0.023), ('iff', 0.022), ('system', 0.022), ('constituent', 0.021), ('terminal', 0.021), ('secondly', 0.021), ('children', 0.021), ('ron', 0.021), ('sample', 0.021), ('derivation', 0.02), ('anp', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Author: Alexander Clark, Rémi Eyraud

2 0.061987944 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

Abstract: The paper introduces L OG A N -H —a system for learning ﬁrst-order function-free Horn expressions from interpretations. The system is based on an algorithm that learns by asking questions and that was proved correct in previous work. The current paper shows how the algorithm can be implemented in a practical system, and introduces a new algorithm based on it that avoids interaction and learns from examples only. The L OG A N -H system implements these algorithms and adds several facilities and optimizations that allow efﬁcient applications in a wide range of problems. As one of the important ingredients, the system includes several fast procedures for solving the subsumption problem, an NP-complete problem that needs to be solved many times during the learning process. We describe qualitative and quantitative experiments in several domains. The experiments demonstrate that the system can deal with varied problems, large amounts of data, and that it achieves good classiﬁcation accuracy. Keywords: inductive logic programming, subsumption, bottom-up learning, learning with queries

3 0.040088553 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

Author: Evgeniy Gabrilovich, Shaul Markovitch

Abstract: Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowledge, including semi-supervised learning and transfer learning. In this work, we present a new framework for automatic acquisition of world knowledge and methods for incorporating it into the text categorization process. Our approach enhances machine learning algorithms with features generated from domain-speciﬁc and common-sense knowledge. This knowledge is represented by ontologies that contain hundreds of thousands of concepts, further enriched through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts that augment the bag of words used in simple supervised learning. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses two signiﬁcant problems in natural language processing—synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. We applied our methodology using the Open Directory Project, the largest existing Web directory built by over 70,000 human editors. Experimental results over a range of data sets conﬁrm improved performance compared to the bag of words document representation. Keywords: feature generation, text classiﬁcation, background knowledge

4 0.024136735 74 jmlr-2007-Separating Models of Learning from Correlated and Uncorrelated Data (Special Topic on the Conference on Learning Theory 2005)

Author: Ariel Elbaz, Homin K. Lee, Rocco A. Servedio, Andrew Wan

Abstract: We consider a natural framework of learning from correlated data, in which successive examples used for learning are generated according to a random walk over the space of possible examples. A recent paper by Bshouty et al. (2003) shows that the class of polynomial-size DNF formulas is efﬁciently learnable in this random walk model; this result suggests that the Random Walk model is more powerful than comparable standard models of learning from independent examples, in which similarly efﬁcient DNF learning algorithms are not known. We give strong evidence that the Random Walk model is indeed more powerful than the standard model, by showing that if any cryptographic one-way function exists (a universally held belief in cryptography), then there is a class of functions that can be learned efﬁciently in the Random Walk setting but not in the standard setting where all examples are independent. Keywords: random walks, uniform distribution learning, cryptographic hardness, correlated data, PAC learning

5 0.023561157 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

Author: Sofus A. Macskassy, Foster Provost

Abstract: paper1 This is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classiﬁcation in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classiﬁers comprise a local classiﬁer, a relational classiﬁer, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classiﬁcation, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classiﬁcation in machine learning benchmark data sets. The results demonstrate that very simple network-classiﬁcation models perform quite well—well enough that they should be used regularly as baseline classiﬁers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-ﬁeld classiﬁers, Hopﬁeld networks, and relational-neighbor classiﬁers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection. Keywords: relational learning, network learning, collective inference, collective classiﬁcation, networked data, probabilistic relational models, network analysis, network data

6 0.02208834 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

7 0.021907726 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

8 0.01995373 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time

9 0.019706385 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

10 0.019532908 54 jmlr-2007-Measuring Differentiability: Unmasking Pseudonymous Authors

11 0.019202048 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)

12 0.017795479 73 jmlr-2007-Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data

13 0.016933078 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions (Special Topic on the Conference on Learning Theory 2005)

14 0.016864056 38 jmlr-2007-Graph Laplacians and their Convergence on Random Neighborhood Graphs (Special Topic on the Conference on Learning Theory 2005)

15 0.016273655 72 jmlr-2007-Relational Dependency Networks

16 0.015725564 90 jmlr-2007-Value Regularization and Fenchel Duality

17 0.015723191 83 jmlr-2007-The On-Line Shortest Path Problem Under Partial Monitoring

18 0.015494305 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

19 0.015436969 31 jmlr-2007-Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm

20 0.015137347 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.1), (1, 0.038), (2, -0.014), (3, -0.008), (4, -0.037), (5, 0.063), (6, 0.034), (7, -0.004), (8, 0.027), (9, -0.112), (10, 0.02), (11, 0.111), (12, -0.061), (13, 0.108), (14, 0.027), (15, 0.026), (16, -0.013), (17, -0.144), (18, 0.071), (19, 0.016), (20, 0.081), (21, -0.023), (22, -0.053), (23, 0.041), (24, 0.097), (25, 0.109), (26, 0.053), (27, -0.022), (28, 0.128), (29, 0.027), (30, 0.108), (31, 0.059), (32, -0.186), (33, 0.234), (34, -0.04), (35, 0.031), (36, -0.092), (37, -0.382), (38, 0.596), (39, -0.144), (40, -0.222), (41, -0.122), (42, -0.015), (43, 0.19), (44, 0.126), (45, -0.001), (46, -0.069), (47, 0.041), (48, -0.117), (49, -0.157)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9800477 67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Author: Alexander Clark, Rémi Eyraud

2 0.25883022 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

3 0.17591594 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

Author: Evgeniy Gabrilovich, Shaul Markovitch

4 0.10253914 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

Author: Rie Johnson, Tong Zhang

Abstract: This paper investigates the effect of Laplacian normalization in graph-based semi-supervised learning. To this end, we consider multi-class transductive learning on graphs with Laplacian regularization. Generalization bounds are derived using geometric properties of the graph. Speciﬁcally, by introducing a deﬁnition of graph cut from learning theory, we obtain generalization bounds that depend on the Laplacian regularizer. We then use this analysis to better understand the role of graph Laplacian matrix normalization. Under assumptions that the cut is small, we derive near-optimal normalization factors by approximately minimizing the generalization bounds. The analysis reveals the limitations of the standard degree-based normalization method in that the resulting normalization factors can vary signiﬁcantly within each connected component with the same class label, which may cause inferior generalization performance. Our theory also suggests a remedy that does not suffer from this problem. Experiments conﬁrm the superiority of the normalization scheme motivated by learning theory on artiﬁcial and real-world data sets. Keywords: transductive learning, graph learning, Laplacian regularization, normalization of graph Laplacian

5 0.086100027 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

Author: Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh

Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random ﬁelds (DCRFs), a generalization of linear-chain conditional random ﬁelds (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, ﬁnding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the ﬁnal task directly. Keywords: conditional random ﬁelds, graphical models, sequence labeling

6 0.079381816 15 jmlr-2007-Bilinear Discriminant Component Analysis

7 0.076933391 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

8 0.070589714 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

9 0.070558831 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions (Special Topic on the Conference on Learning Theory 2005)

10 0.069893301 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

11 0.068663724 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

12 0.067881972 74 jmlr-2007-Separating Models of Learning from Correlated and Uncorrelated Data (Special Topic on the Conference on Learning Theory 2005)

13 0.06757278 42 jmlr-2007-Infinitely Imbalanced Logistic Regression

14 0.067285016 38 jmlr-2007-Graph Laplacians and their Convergence on Random Neighborhood Graphs (Special Topic on the Conference on Learning Theory 2005)

15 0.066620409 73 jmlr-2007-Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data

16 0.065007083 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)

17 0.060930274 53 jmlr-2007-Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

18 0.05767642 78 jmlr-2007-Statistical Consistency of Kernel Canonical Correlation Analysis

19 0.055103943 34 jmlr-2007-From External to Internal Regret (Special Topic on the Conference on Learning Theory 2005)

20 0.054899402 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.015), (12, 0.021), (15, 0.029), (28, 0.046), (40, 0.037), (45, 0.011), (48, 0.027), (58, 0.534), (60, 0.044), (80, 0.022), (85, 0.032), (98, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75197804 67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Author: Alexander Clark, Rémi Eyraud

2 0.20794232 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time

Author: Roland Nilsson, José M. Peña, Johan Björkegren, Jesper Tegnér

Abstract: We analyze two different feature selection problems: ﬁnding a minimal feature set optimal for classiﬁcation (MINIMAL - OPTIMAL) vs. ﬁnding all features relevant to the target variable (ALL RELEVANT ). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of data distributions for which there exist consistent, polynomial-time algorithms. We also prove that ALL - RELEVANT is much harder than MINIMAL - OPTIMAL and propose two consistent, polynomial-time algorithms. We argue that the distribution classes considered are reasonable in many practical cases, so that our results simplify feature selection in a wide range of machine learning tasks. Keywords: learning theory, relevance, classiﬁcation, Markov blanket, bioinformatics

3 0.20330247 53 jmlr-2007-Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

Author: Miroslav Dudík, Steven J. Phillips, Robert E. Schapire

Abstract: We present a uniﬁed and complete account of maximum entropy density estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special cases, we easily derive performance guarantees for many known regularization types, including 1 , 2 , 2 , and 1 + 2 style regularization. We propose an algorithm solving a large and 2 2 general subclass of generalized maximum entropy problems, including all discussed in the paper, and prove its convergence. Our approach generalizes and uniﬁes techniques based on information geometry and Bregman divergences as well as those based more directly on compactness. Our work is motivated by a novel application of maximum entropy to species distribution modeling, an important problem in conservation biology and ecology. In a set of experiments on real-world data, we demonstrate the utility of maximum entropy in this setting. We explore effects of different feature types, sample sizes, and regularization levels on the performance of maxent, and discuss interpretability of the resulting models. Keywords: maximum entropy, density estimation, regularization, iterative scaling, species distribution modeling

4 0.19963783 37 jmlr-2007-GiniSupport Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression

Author: Shantanu Chakrabartty, Gert Cauwenberghs

Abstract: Many classiﬁcation tasks require estimation of output class probabilities for use as conﬁdence scores or for inference integrated with other models. Probability estimates derived from large margin classiﬁers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin classiﬁcation to GiniSVM maximum entropy multi-class probability regression. GiniSVM combines a quadratic (Gini-Simpson) entropy based agnostic model with a kernel based similarity model. A form of Huber loss in the GiniSVM primal formulation elucidates a connection to robust estimation, further corroborated by the impulsive noise ﬁltering property of the reverse water-ﬁlling procedure to arrive at normalized classiﬁcation margins. The GiniSVM normalized classiﬁcation margins directly provide estimates of class conditional probabilities, approximating kernel logistic regression (KLR) but at reduced computational cost. As with other SVMs, GiniSVM produces a sparse kernel expansion and is trained by solving a quadratic program under linear constraints. GiniSVM training is efﬁciently implemented by sequential minimum optimization or by growth transformation on probability functions. Results on synthetic and benchmark data, including speaker veriﬁcation and face detection data, show improved classiﬁcation performance and increased tolerance to imprecision over soft-margin SVM and KLR. Keywords: support vector machines, large margin classiﬁers, kernel regression, probabilistic models, quadratic entropy, Gini index, growth transformation

5 0.19943894 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation

Author: Guy Lebanon, Yi Mao, Joshua Dillon

Abstract: The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efﬁcient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks. Keywords: text processing, local smoothing

6 0.19938219 20 jmlr-2007-Combining PAC-Bayesian and Generic Chaining Bounds

7 0.19840539 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

8 0.19730708 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

9 0.1969642 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

10 0.19651119 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

11 0.19571529 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

12 0.19559452 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

13 0.19487202 61 jmlr-2007-On the Consistency of Multiclass Classification Methods (Special Topic on the Conference on Learning Theory 2005)

14 0.19419771 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)

15 0.19396493 43 jmlr-2007-Integrating Naïve Bayes and FOIL

16 0.19376817 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

17 0.19348621 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

18 0.1929356 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss

19 0.19216067 77 jmlr-2007-Stagewise Lasso

20 0.19134954 39 jmlr-2007-Handling Missing Values when Applying Classification Models