jmlr jmlr2006 jmlr2006-2 knowledge-graph by maker-knowledge-mining

2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

Source: pdf

Author: Alberto Roverato, Milan Studený

Abstract: This paper deals with chain graph models under alternative AMP interpretation. A new representative of an AMP Markov equivalence class, called the largest deﬂagged graph, is proposed. The representative is based on revealed internal structure of the AMP Markov equivalence class. More speciﬁcally, the AMP Markov equivalence class decomposes into ﬁner strong equivalence classes and there exists a distinguished strong equivalence class among those forming the AMP Markov equivalence class. The largest deﬂagged graph is the largest chain graph in that distinguished strong equivalence class. A composed graphical procedure to get the largest deﬂagged graph on the basis of any AMP Markov equivalent chain graph is presented. In general, the largest deﬂagged graph differs from the AMP essential graph, which is another representative of the AMP Markov equivalence class. Keywords: chain graph, AMP Markov equivalence, strong equivalence, largest deﬂagged graph, component merging procedure, deﬂagging procedure, essential graph

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 More speciﬁcally, the AMP Markov equivalence class decomposes into ﬁner strong equivalence classes and there exists a distinguished strong equivalence class among those forming the AMP Markov equivalence class. [sent-7, score-0.982]

2 The largest deﬂagged graph is the largest chain graph in that distinguished strong equivalence class. [sent-8, score-0.72]

3 A composed graphical procedure to get the largest deﬂagged graph on the basis of any AMP Markov equivalent chain graph is presented. [sent-9, score-0.454]

4 In general, the largest deﬂagged graph differs from the AMP essential graph, which is another representative of the AMP Markov equivalence class. [sent-10, score-0.51]

5 Keywords: chain graph, AMP Markov equivalence, strong equivalence, largest deﬂagged graph, component merging procedure, deﬂagging procedure, essential graph 1. [sent-11, score-0.538]

6 In general, a chain graph model is a statistical model in which a chain graph is used to represent the conditional independence structure deﬁning the statistical model. [sent-13, score-0.442]

7 The distinction between different interpretations of chain graphs is reﬂected by the different concepts of equivalence, namely the LWF Markov equivalence and the AMP Markov equivalence. [sent-23, score-0.352]

8 However, if we represent a statistical model using an arbitrary graph in the respective Markov equivalence class, then the non-unique nature of graphical description may result in difﬁculties. [sent-25, score-0.394]

9 A solution to these problems may be provided by a suitable choice of a unique representative of each Markov equivalence class, that is, of a particular element in that equivalence class. [sent-28, score-0.474]

10 Frydenberg (1990) showed that every LWF equivalence class contains the largest chain graph, which is the graph with the largest amount of undirected edges within the LWF equivalence class. [sent-32, score-0.856]

11 Furthermore, every arrow in the largest chain graph is an arrow with the same direction in every chain graph from the class. [sent-33, score-0.936]

12 The largest chain graph is uniquely determined and can serve as a natural representative of the LWF equivalence class. [sent-34, score-0.541]

13 Moreover, there exist at least two procedures that transform every chain graph into the largest chain graph of the respective LWF equivalence class Roverato (2005); Volf and Studen´ (1999). [sent-35, score-0.771]

14 It is not clear what is a natural representative of an AMP equivalence class and, in particular, the notion of the “largest AMP chain graph” makes no sense. [sent-37, score-0.369]

15 Every arrow in the AMP essential graph is either an arrow with the same direction or an undirected edge in every chain graph from the equivalence class. [sent-40, score-1.122]

16 Their terminology was inspired by the case of acyclic directed graph models, in which case the corresponding equivalence class has a suitable representative called the essential graph (Andersson et al. [sent-41, score-0.66]

17 Indeed, if an AMP equivalence class contains an equivalence class of acyclic directed graphs, then the AMP essential graph coincides with the respective essential graph; see Proposition 4. [sent-43, score-0.787]

18 Furthermore, in the case that the AMP equivalence class contains a completely undirected graph, it may happen that the AMP essential graph has some arrows, and this unpleasant phenomenon was already reported in Section 7 of Andersson et al. [sent-46, score-0.461]

19 The point is that AMP equivalence classes have a more complicated structure than LWF equivalence classes. [sent-49, score-0.42]

20 Nevertheless, if an AMP equivalence class contains an acyclic directed graph then our representative reduces to the essential graph of the corresponding equivalence class of acyclic directed graphs. [sent-53, score-0.945]

21 Moreover, it provides a better solution if the AMP equivalence class contains an undirected 1046 A G RAPHICAL R EPRESENTATION OF E QUIVALENCE C LASSES OF AMP C HAIN G RAPHS graph because then, as one would expect, that undirected graph coincides with our largest deﬂagged graph. [sent-54, score-0.635]

22 We introduce a new concept, namely, the concept of strong equivalence of chain graphs. [sent-56, score-0.346]

23 An important observation is that every AMP equivalence class of chain graphs decomposes into strong equivalence classes. [sent-57, score-0.643]

24 Every strong equivalence class has a unique representative given by the largest chain graph of the class. [sent-59, score-0.602]

25 We introduce a procedure, called the component merging procedure, which starts from any chain graph G and, by replacing arrows with undirected edges, ﬁnds the largest graph in the class of chain graphs strongly equivalent to G. [sent-60, score-0.909]

26 There exists a unique distinguished strong equivalence class among those forming an AMP equivalence class. [sent-62, score-0.501]

27 We introduce a procedure, called the deﬂagging procedure, which starts from any chain graph G and, by replacing undirected edges with arrows, produces a deﬂagged graph G AMP equivalent to G. [sent-65, score-0.441]

28 This representative can be constructed by applying the component merging procedure to the chain graph G obtained by the deﬂagging procedure. [sent-68, score-0.431]

29 The results on strong equivalence of chain graphs are formulated in Section 5. [sent-72, score-0.398]

30 In Section 6, we present a deﬂagging procedure to get a deﬂagged graph in a given AMP equivalence class. [sent-74, score-0.341]

31 / An undirected graph is a hybrid graph without arrows, that is, A = 0. [sent-107, score-0.366]

32 Given a hybrid graph H over N, − the respective underlying graph is an undirected graph H u over N such that a − − b in H u iff [a, b] − is an edge in H. [sent-109, score-0.607]

33 / A directed graph is a hybrid graph without lines, that is, L = 0. [sent-110, score-0.351]

34 It is easy to see that every undirected graph and every acyclic directed graph is a CG. [sent-142, score-0.408]

35 / Given a hybrid graph H over N and 0 = A ⊆ N, the induced subgraph of H for A, denoted by HA , is the graph HA ≡ (A, A ∩ (A × A), L ∩ P (A)), where P (A) ≡ {B; B ⊆ A}. [sent-143, score-0.401]

36 A triplex in a hybrid graph H is a pair {a, b}, c such that either a −→ c ←− b is an immorality in H, a −→ c − − b is a ﬂag in H or a − − c ←− b is a ﬂag − − in H. [sent-160, score-0.563]

37 Two CGs G and H over N will be called triplex equivalent iff they have the same underlying graph and triplexes. [sent-162, score-0.448]

38 If, for instance, a −→ c ←− b is an immorality in G then it need not be an immorality in H but it has to be one of three versions of the triplex {a, b}, c . [sent-164, score-0.497]

39 Thus, this graph G∞ , named the largest chain graph of G , can serve as a natural representative of G . [sent-182, score-0.462]

40 Then all arrows in G which are not ‘protected’ are replaced with lines and the largest chain graph G∞ of G is obtained. [sent-189, score-0.389]

41 The essential difference is that in Deﬁnition 1 we require that at least one arrow exists from a member of U to a member of L, while in Roverato (2005) a possibly empty collection of arrows from U to L was allowed. [sent-206, score-0.354]

42 Given a pair of components (U, L) which deﬁnes a meta-arrow, we replace all arrows of the meta-arrow U ⇉ L with lines and say that the resulting hybrid graph G′ is obtained by merging of components U and L; more speciﬁcally, by merging of the upper component U and the lower component L. [sent-213, score-0.659]

43 In fact, that is the reason we decided to name this operation with CGs “feasible merging of components’’ because the condition ensures that one remains in the same LWF equivalence class of CG after the merging operation. [sent-222, score-0.529]

44 Then we introduce a special kind of equivalence of CGs, called strong equivalence, such that every AMP equivalence class decomposes into strong equivalence classes. [sent-227, score-0.757]

45 The next step is to introduce a special ﬂag ordering between strong equivalence classes within a ﬁxed AMP equivalence class. [sent-229, score-0.498]

46 We show that the smallest element with respect to that ordering exists and, ﬁnally, we propose to represent the whole AMP equivalence class by a natural representative of that distinguished strong equivalence class, called largest deﬂagged graph. [sent-230, score-0.643]

47 They showed that two CGs over the same set of nodes are AMP Markov equivalent iff they are triplex equivalent. [sent-236, score-0.343]

48 The result on the existence and uniqueness of the largest CG with respect to this ordering in each LWF equivalence class reported in Section 3 makes this object a natural representative of the LWF equivalence class. [sent-244, score-0.577]

49 What is important is that every AMP equivalence class decomposes into some ﬁner equivalence classes. [sent-246, score-0.455]

50 2 Deﬁnition of Strong Equivalence Our decomposition of a given AMP equivalence class is based on the distinction between triplex edges, namely the arrows and lines that belong to a triplex, and non-triplex edges. [sent-248, score-0.588]

51 More speciﬁcally, if two triplex equivalent CGs have identical triplex edges, then we say that they are strongly equivalent. [sent-249, score-0.568]

52 There exists a unique largest graph within every strong equivalence class. [sent-253, score-0.483]

53 Note that, since all the graphs in H have the same triplex edges, it makes sense to say that a −→ c is a triplex arrow in H if a −→ c is a triplex arrow in every CG from H , and similarly for triplex lines. [sent-262, score-1.474]

54 We are going to show in Section 5 that, similarly to the LWF case, every strong equivalence class H has a unique largest element. [sent-263, score-0.347]

55 We also present a special component merging procedure to get the largest element on basis of any graph in H there. [sent-264, score-0.343]

56 Strong equivalence is an equivalence relation that induces a partition of any AMP equivalence class H of CGs. [sent-265, score-0.645]

57 3 Flag Ordering Interestingly, the relation (1) restricted to triplex edges deﬁnes a partial ordering between strong equivalence classes from H. [sent-268, score-0.561]

58 We say that H is ﬂag larger than G and write H G if the following condition holds: whenever a −→ b is a triplex arrow in H then a −→ b in G . [sent-270, score-0.45]

59 Proposition 6 Given an AMP equivalence class H, there exists a unique strong equivalence class H ↓ for all H ∈ H. [sent-280, score-0.516]

60 Choose G ∈ G and H ∈ H and construct a hybrid graph F with the same underlying graph as G (and H) in this way: a −→ b in F iff either a −→ b in G or [a − − b in G and a −→ b in H]. [sent-282, score-0.365]

61 The conclusion H F can be veriﬁed directly: if a −→ b is a triplex arrow in H (= in H ) then the fact that H and G are triplex equivalent implies that either a −→ b in G or a − − b in G which both gives a −→ b in F (= in F ). [sent-287, score-0.756]

62 4 Deﬂagged Graphs and Essential Flags Given an AMP equivalence class H, the symbol H ↓ will be used to denote the least strong equivalence class in H with respect to . [sent-289, score-0.496]

63 AMP equivalence classes can effectively be handled by ﬁrst considering their natural partition into strong equivalence classes (partially ordered by ), and then by dealing with the CGs in every strong equivalence class (partially ordered by ≥). [sent-314, score-0.757]

64 In this way, it is possible to identify unambiguously a graph in H by ﬁrst considering the ﬂag-smallest strong equivalence class and then by taking the largest graph within that class. [sent-315, score-0.589]

65 Deﬁnition 9 (largest deﬂagged graph) The graph H ↓ is the largest deﬂagged graph of an AMP equivalence class H if (i) H ↓ ∈ H ↓ , (ii) H ↓ ≥ H for all H ∈ H ↓ . [sent-316, score-0.543]

66 Then a component merging procedure from Section 5 can be applied to G to get the largest deﬂagged graph H ↓ . [sent-321, score-0.343]

67 More speciﬁcally, we prove the existence of the largest CG within each strong equivalence class, introduce the respective elementary operation ascribing a larger strongly equivalent CG to a CG, and show that the largest CG in a strong equivalence class is attainable by this operation. [sent-325, score-0.784]

68 Deﬁnition 10 (cyclic arrow) Given a hybrid graph H, we say that an arrow a −→ b in H is a cyclic arrow in H if b ∈ anH (a). [sent-331, score-0.613]

69 Corollary 13 Given a strong equivalence class G of CGs over N, there exists G† ∈ G which is the largest CG in G . [sent-358, score-0.347]

70 Then the conditions from Deﬁnition 14 are satisﬁed iff the graph G′ obtained by merging of components U and L is a CG strongly equivalent to G; of course, it is (strictly) larger than G. [sent-373, score-0.406]

71 3 Component Merging Procedure An important fact is that the largest CG in a strong equivalence class G can be obtained from any CG in G by consecutive application of the operation of legal merging of components. [sent-378, score-0.553]

72 Corollary 17 Given a strong equivalence class G of CGs over N and G ∈ G , the largest CG G† in G is attainable from G by a series of legal mergings. [sent-390, score-0.39]

73 Deﬂagging Procedure In this section we describe a procedure to construct a deﬂagged graph G starting from any CG G in the respective AMP equivalence class H. [sent-393, score-0.384]

74 A blocked ending at a node a will mean that the line cannot be replaced with an arrow directed to a, for otherwise we would get a graph outside H. [sent-409, score-0.501]

75 3 Overall Procedure The application of the above algorithms to a CG G produces a graph G′ in the same AMP equivalence class such that G ≥ G′ . [sent-455, score-0.356]

76 Then, every line a − − b in G such that a − − b in Gℓ is a line in every CG F which is x−x version G − triplex equivalent to G. [sent-469, score-0.344]

77 That means, for instance, that an implementable construction procedure to obtain the representative (on the basis of any other graph in the Markov equivalence class) should be at our disposal. [sent-487, score-0.395]

78 Thus, if H contains an undirected graph then the largest deﬂagged graph H ↓ coincides with that undirected graph. [sent-504, score-0.41]

79 Analogously, if H contains an acyclic directed graph D then H ↓ coincides with the essential graph D∗ for D (Andersson et al. [sent-505, score-0.381]

80 Nevertheless, in general, the largest deﬂagged graph H ↓ is different from the AMP essential graph H ∗ : for instance, if H contains an undirected graph then H ∗ may even have some arrows (see Andersson et al. [sent-510, score-0.63]

81 In short, one cannot make causal discovery between a and b if there is an undirected edge between a and b in at least one of the chain graphs from the Markov equivalence class, or if there are two chain graphs such that a −→ b in the one of them ﬁrst and b −→ a in the latter one. [sent-517, score-0.6]

82 On the other hand, if an arrow a −→ b is invariant across the respective Markov equivalence class then causal discovery could be possible. [sent-518, score-0.475]

83 Consequently, from the point of view of causal discovery in chain graphs, a good representative of a Markov equivalence class should indicate that the corresponding edge is not an invariant arrow by the presence of a line. [sent-519, score-0.628]

84 , y 1997), and the B -essential graphs (Roverato and La Rocca, 2006), are fully informative from this point of view because they have the largest number of lines and, furthermore, they contain an arrow if and only if it is invariant. [sent-521, score-0.343]

85 As the examples in Figures 2 and 4 show, a CG with this property may not exist in an AMP equivalence class and therefore both the AMP essential graph and the largest deﬂagged graph may contain some arrows that are not invariant. [sent-522, score-0.678]

86 However, the largest deﬂagged graph is more informative than the AMP essential graph because it is a larger chain graph and, therefore, it has more lines. [sent-523, score-0.598]

87 More speciﬁcally, a CG G is the largest deﬂagged graph iff it is again obtained by the consecutive application of two procedures: the deﬂagging procedure is applied to G and the component merging procedure to its result G. [sent-525, score-0.388]

88 This is because otherwise ds −→ ds−1 ←− ds−2 is an immorality in G or ds −→ ds−1 − − ds−2 is a ﬂag in G, which, by strong equiv− alence of G and H, implies that ds −→ ds−1 in H and this contradicts the assumption that ρ is a semi-directed cycle in G ∪ H. [sent-562, score-0.583]

89 − 1065 ´ ROVERATO AND S TUDEN Y Fact 2 There is no cyclic arrow a −→ c in G ∪ H which belongs either to an immorality a −→ c ←− b or to a ﬂag a −→ c − − b in G ∪ H. [sent-567, score-0.348]

90 By strong equivalence of − G and H, G has the same induced subgraph for {a, c, b}, which contradicts the fact c −→ b in G. [sent-578, score-0.415]

91 − Proof of Proposition 15 This proposition is analogous to the result on LWF equivalence and feasible merging given in Theorem 8 of Roverato (2005). [sent-618, score-0.396]

92 Since strong equivalence of CGs implies their complex equivalence the necessity of conditions (i)-(ii) follows from Theorem 8 in Roverato (2005). [sent-623, score-0.505]

93 This arrow is also an arrow in E (with the same direction); the other arrows of ρ either are kept in E or become lines, the lines of ρ retain in E. [sent-696, score-0.51]

94 This complex is not in E which contradicts the assumption that E and G are strongly equivalent since strong equivalence implies complex equivalence. [sent-712, score-0.419]

95 If we add this arrow to the conﬁguration (5) in Gℓ above then we obtain (possibly omitting an edge between g and d) a x g d b Nodes g and d are necessarily adjacent for otherwise Gℓ has a forbidden conﬁguration g −→ b − − d of the type (a). [sent-789, score-0.346]

96 Realize that G ≥ G′ implies that an arrow in G′ cannot be an arrow with the opposite direction in G. [sent-838, score-0.417]

97 Analogously, if a −→ b ←− d − is an immorality in G′ that does not correspond to a triplex in G then a − − b − − d in G. [sent-841, score-0.374]

98 As − − G and F are triplex equivalent, F has a triplex {b, d}, a , which, however, contradicts the assumption a −→ b in F. [sent-875, score-0.58]

99 Characterizing Markov equivalence classes for AMP chain graph models. [sent-926, score-0.431]

100 A uniﬁed approach to the characterisation of equivalence classes of DAGs, chain graphs with no ﬂags and chain graphs. [sent-981, score-0.442]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('amp', 0.518), ('cg', 0.357), ('triplex', 0.251), ('lwf', 0.214), ('equivalence', 0.21), ('agged', 0.208), ('cgs', 0.199), ('arrow', 0.199), ('roverato', 0.177), ('ag', 0.15), ('merging', 0.141), ('graph', 0.131), ('immorality', 0.123), ('pag', 0.118), ('studen', 0.096), ('andersson', 0.095), ('tuden', 0.091), ('blocked', 0.091), ('chain', 0.09), ('cycle', 0.086), ('forbidden', 0.086), ('hain', 0.085), ('quivalence', 0.085), ('contradicts', 0.078), ('ds', 0.077), ('arrows', 0.076), ('epresentation', 0.072), ('lasses', 0.072), ('raphs', 0.072), ('guration', 0.069), ('agging', 0.068), ('blocking', 0.065), ('ags', 0.063), ('legal', 0.063), ('essential', 0.059), ('hybrid', 0.058), ('largest', 0.056), ('raphical', 0.055), ('subgraph', 0.055), ('representative', 0.054), ('graphs', 0.052), ('frydenberg', 0.048), ('strong', 0.046), ('undirected', 0.046), ('proposition', 0.045), ('iff', 0.045), ('strongly', 0.045), ('markov', 0.043), ('directing', 0.041), ('edge', 0.037), ('lines', 0.036), ('ending', 0.033), ('contradiction', 0.033), ('shortened', 0.032), ('ordering', 0.032), ('descending', 0.032), ('directed', 0.031), ('elementary', 0.029), ('acyclic', 0.029), ('labeling', 0.028), ('respective', 0.028), ('triplexes', 0.027), ('induced', 0.026), ('cyclic', 0.026), ('nodes', 0.026), ('graphical', 0.025), ('adjacent', 0.024), ('labeled', 0.024), ('ii', 0.024), ('causal', 0.023), ('components', 0.023), ('vertex', 0.023), ('edges', 0.022), ('dn', 0.022), ('operation', 0.022), ('contradictory', 0.021), ('endings', 0.021), ('madigan', 0.021), ('equivalent', 0.021), ('necessity', 0.02), ('perlman', 0.02), ('every', 0.02), ('exists', 0.02), ('lauritzen', 0.019), ('implies', 0.019), ('volf', 0.018), ('con', 0.018), ('iii', 0.017), ('rules', 0.017), ('line', 0.016), ('anh', 0.016), ('labelingalgorithm', 0.016), ('owing', 0.016), ('ci', 0.016), ('gi', 0.016), ('free', 0.016), ('conclusion', 0.015), ('proof', 0.015), ('class', 0.015), ('component', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

Author: Alberto Roverato, Milan Studený

2 0.1175291 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

Author: Robert Castelo, Alberto Roverato

Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference

3 0.091952734 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

Author: Luis M. de Campos

Abstract: We propose a new scoring function for learning Bayesian networks from data using score+search algorithms. This is based on the concept of mutual information and exploits some well-known properties of this measure in a novel way. Essentially, a statistical independence test based on the chi-square distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a non-Bayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the Kullback-Leibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the well-known K2, BDeu and BIC/MDL scores are also presented. Keywords: Bayesian networks, scoring functions, learning, mutual information, conditional independence tests

4 0.064134471 25 jmlr-2006-Distance Patterns in Structural Similarity

Author: Thomas Kämpke

Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best ﬁt isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost ﬂow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base

5 0.051001363 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation

Author: Jelle R. Kok, Nikos Vlassis

Abstract: In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcementlearning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efﬁcient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences. Keywords: collaborative multiagent system, coordination graph, reinforcement learning, Qlearning, belief propagation

6 0.038928416 67 jmlr-2006-On Representing and Generating Kernels by Fuzzy Equivalence Relations

7 0.03532856 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models

8 0.030037522 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

9 0.02703684 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

10 0.026928922 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

11 0.026723903 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity

12 0.022603674 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

13 0.019626904 53 jmlr-2006-Learning a Hidden Hypergraph

14 0.019234 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

15 0.01910088 33 jmlr-2006-Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (Special Topic on Machine Learning and Optimization)

16 0.018009799 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

17 0.017945519 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

18 0.017466567 4 jmlr-2006-A Linear Non-Gaussian Acyclic Model for Causal Discovery

19 0.016810922 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

20 0.016445016 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.106), (1, -0.028), (2, -0.147), (3, 0.028), (4, -0.053), (5, 0.052), (6, 0.244), (7, -0.073), (8, -0.031), (9, -0.032), (10, 0.018), (11, 0.037), (12, -0.062), (13, -0.042), (14, 0.176), (15, 0.009), (16, 0.085), (17, -0.002), (18, -0.039), (19, 0.078), (20, 0.285), (21, 0.068), (22, 0.064), (23, -0.048), (24, 0.088), (25, 0.084), (26, 0.121), (27, 0.34), (28, -0.036), (29, 0.134), (30, -0.042), (31, 0.057), (32, -0.089), (33, 0.097), (34, -0.174), (35, -0.117), (36, -0.044), (37, 0.205), (38, -0.113), (39, -0.199), (40, -0.207), (41, -0.014), (42, -0.067), (43, -0.028), (44, 0.001), (45, 0.101), (46, -0.058), (47, -0.023), (48, -0.089), (49, -0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96900964 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

Author: Alberto Roverato, Milan Studený

2 0.4404628 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

Author: Robert Castelo, Alberto Roverato

3 0.41415527 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

Author: Luis M. de Campos

4 0.3187018 25 jmlr-2006-Distance Patterns in Structural Similarity

Author: Thomas Kämpke

5 0.30559072 67 jmlr-2006-On Representing and Generating Kernels by Fuzzy Equivalence Relations

Author: Bernhard Moser

Abstract: Kernels are two-placed functions that can be interpreted as inner products in some Hilbert space. It is this property which makes kernels predestinated to carry linear models of learning, optimization or classiﬁcation strategies over to non-linear variants. Following this idea, various kernel-based methods like support vector machines or kernel principal component analysis have been conceived which prove to be successful for machine learning, data mining and computer vision applications. When applying a kernel-based method a central question is the choice and the design of the kernel function. This paper provides a novel view on kernels based on fuzzy-logical concepts which allows to incorporate prior knowledge in the design process. It is demonstrated that kernels mapping to the unit interval with constant one in its diagonal can be represented by a commonly used fuzzylogical formula for representing fuzzy rule bases. This means that a great class of kernels can be represented by fuzzy-logical concepts. Apart from this result, which only guarantees the existence of such a representation, constructive examples are presented and the relation to unlabeled learning is pointed out. Keywords: kernel, triangular norm, T -transitivity, fuzzy relation, residuum 1. Motivation Positive-deﬁniteness plays a prominent role especially in optimization and machine learning due to the fact that two-place functions with this property, so-called kernels, can be represented as inner products in some Hilbert space. Thereby, optimization techniques conceived on the basis of linear models can be extended to non-linear algorithms. For a survey of applications see, for example, ¨ Jolliffe (1986), Sch¨ lkopf and Smola (2002) and Scholkopf et al. (1998). o Recently in Moser (2006) it was shown that kernels with values from the unit interval can be interpreted as fuzzy equivalence relations motivated by the idea that kernels express a kind of similarity. This means that the concept of fuzzy equivalence relations, or synonymously fuzzy similarity relations, is more general than that of kernels, provided only values in the unit interval are considered. Fuzzy equivalence relations distinguish from Boolean equivalence relations by a many-valued extension of transitivity which can be interpreted as many-valued logical model of the statement “IF x is similar to y AND y is similar to z THEN x is similar to z”. In contrast to the Boolean case, in many-valued logics the set of truth values is extended such that also assertions, for example, whether two elements x and y are similar, can be treated as a matter of degree. The standard model for the set of (quasi) truth values of fuzzy logic and other many-valued logical systems is the unit interval. If E(x, y) represents the (quasi) truth value of the statement that x is c 2006 Bernhard Moser. M OSER similar to y, then the many-valued version of transitivity is modeled by T (E(x, y), E(y, z)) ≤ E(x, z) where T is a so-called triangular norm which is an extension of the Boolean conjunction. This many-valued concept for transitivity is called T -transitivity. For a survey on triangular norms see, for example, Dubois and Prade (1985), Gottwald (1986), Gottwald (1993) and Klement et al. (2000), ¨ and for fuzzy equivalence relations and T -transitivity see, for example, Bodenhofer (2003), H ohle (1993), H¨ hle (1999), Klement et al. (2000), and Zadeh (1971). o Based on the semantics of fuzzy logic, this approach allows to incorporate knowledge-based models for the design of kernels. From this perspective, the most interesting mathematical question is how positive-semideﬁnite fuzzy equivalence relations can be characterized or at least constructed under some circumstances. At least for some special cases, proofs are provided in Section 4, which motivate further research aiming at establishing a more general theory on the positive-deﬁniteness of fuzzy equivalence relations. These cases are based on the most prominent representatives of triangular norms, that is the Minimum, the Product and the Łukasiewicz t-norm. The paper is structured as follows. First of all, in Section 2, some basic prerequisites concerning kernels and fuzzy relations are outlined. In Section 3, a result about the T -transitivity of kernels from Moser (2006) is cited and interpreted as existence statement that guarantees a representation of kernels mapping to the unit interval with constant 1 in its diagonal by a certain, commonly used, fuzzy-logical construction of a fuzzy equivalence relation. Finally, in contrast to the pure existence theorem of Section 3, in Section 4 constructive examples of fuzzy equivalence relations are provided which are proven to be kernels. In a concluding remark, the relationship to the problem of labeled and unlabeled learning is pointed out. 2. Prerequisites This section summarizes deﬁnitions and facts from the theory of kernels as well as from fuzzy set theory which are needed later on. 2.1 Kernels and Positive-Semideﬁniteness Preserving Functions There is an extensive literature concerning kernels and kernel-based methods like support vector machines or kernel principal component analysis especially in the machine learning, data mining ¨ and computer vision communities. For an overview and introduction, see, for example, Sch olkopf and Smola (2002). Here we present only what is needed later on. For completeness let us recall the basic deﬁnition for kernels and positive-semideﬁniteness. Deﬁnition 1 Let X be a non-empty set. A real-valued function k : X × X → R is said to be a kernel iff it is symmetric, that is, k(x, y) = k(y, x) for all x, y ∈ X , and positive-semideﬁnite, that is, ∑n j=1 ci c j k(xi , x j ) ≥ 0 for any n ∈ N, any choice of x1 , . . . , xn ∈ X and any choice of c1 , . . . , cn ∈ R. i, One way to generate new kernels from known kernels is to apply operations which preserve the positive-semideﬁniteness property. A characterization of such operations is provided by C. H. FitzGerald (1995). Theorem 2 (Closeness Properties of Kernels) Let f : Rn → R, n ∈ N, then k : X × X → R given by k(x, y) := f (k1 (x, y), . . . , kn (x, y)) 2604 G ENERATING K ERNELS BY F UZZY R ELATIONS is a kernel for any choice of kernels k1 , . . . , kn on X × X iff f is the real restriction of an entire function on Cn of the form f (x1 , . . . , xn ) = ∑ r1 ≥0,...,rn ≥0 r r cr1 ,...,rn x11 · · · xnn (1) where cr1 ,...,rn ≥ 0 for all nonnegative indices r1 , . . . , rn . 2.2 Triangular Norms Triangular norms have been originally studied within the framework of probabilistic metric spaces, see Schweizer and Sklar (1961) and Schweizer and Sklar (1983). In this context, t-norms proved to be an appropriate concept when dealing with triangle inequalities. Later on, t-norms and their dual version, t-conorms, have been used to model conjunction and disjunction for many-valued logic, see Dubois and Prade (1985), Gottwald (1986), Gottwald (1993) and Klement et al. (2000). Deﬁnition 3 A function T : [0, 1]2 → [0, 1] is called t-norm (triangular norm), if it satisﬁes the following conditions: (i) (ii) (iii) (iv) ∀x, y ∈ [0, 1] : ∀x, y, z ∈ [0, 1] : ∀x, y, z ∈ [0, 1] : ∀x, y ∈ [0, 1] : T (x, y) = T (y, x) T (x, T (y, z)) = T (T (x, y), z) y ≤ z =⇒ T (x, y) ≤ T (x, z) T (x, 1) = x ∧ T (1, y) = y (commutativity) (associativity) (monotonicity) (boundary condition) Further, a t-norm is called Archimedean if it is continuous and satisﬁes x ∈ (0, 1) ⇒ T (x, x) < x. Due to its associativity, many-placed extensions Tn : [0, 1]n → [0, 1], n ∈ N, of a t-norm T are uniquely determined by Tn (x1 , . . . , xn ) = T (x1 , Tn−1 (x2 , . . . , xn )). Archimedean t-norms are characterized by the following representation theorem due to Ling (1965): Theorem 4 Let T : [0, 1]2 → [0, 1] be a t-norm. Then T is Archimedean if, and only if, there is a continuous, strictly decreasing function f : [0, 1] → [0, ∞] with f (1) = 0 such that for x, y ∈ [0, 1], T (x, y) = f −1 (min( f (x) + f (y), f (0))). By setting g(x) = exp (− f (x)), Ling’s characterization yields an alternative representation with a multiplicative generator function T (x, y) = g−1 (max(g(x) g(y), g(0))). For g(x) = x we get the product TP (x, y) = x y. The setting f (x) = 1 − x yields the so-called Łukasiewcz t-norm TL (x, y) = max(x + y − 1, 0). Due to Ling’s theorem 4 an Archimedean t-norm T is isomorphic either to TL or TP , depending on whether the additive generator takes a ﬁnite value at 0 or not. In the former case, the Archimedean t-norm is called non-strict, in the latter it is called strict. 2605 M OSER A many-valued model of an implication is provided by the so-called residuum given by → T (a, b) = sup{c ∈ [0, 1]|T (a, c) ≤ b} (2) where T is a left-continuous t-norm. Equation (2) is uniquely determined by the so-called adjunction property → ∀a, b, c ∈ [0, 1] : T (a, b) ≤ c ⇔ a ≤ T (b, c). Consequently, the operator ↔ → → T (a, b) = min T (a, b), T (b, a) (3) (4) models a biimplication. For details, for example, see Gottwald (1986) and Klement et al. (2000). → Tables 1 and 2 list examples of t-norms with their induced residuum T . For further examples see, for example, Klement et al. (2000). √ √ Tcos (a, b) = max(ab − 1 − a2 1 − b2 , 0) TL (a, b) = max(a + b − 1, 0) TP (a, b) = ab TM (a, b) = min(a, b) Table 1: Examples of t-norms → T cos (a, b) = → T L (a, b) = → = T P (a, b) → T M (a, b) = cos(arccos(b) − arccos(a)) if a > b, 1 else min(b − a + 1, 1) b if a > b, a 1 else b if a > b, 1 else Table 2: Examples of residuums 2.3 T -Equivalences If we want to classify based on a notion of similarity or indistinguishability, we face the problem of transitivity. For instance, let us consider two real numbers to be indistinguishable if and only if they differ by at most a certain bound ε > 0, this is modeled by the relation ∼ ε given by x ∼ε y :⇔ |x−y| < ε, ε > 0, x, y ∈ R. Note that the relation ∼ε is not transitive and, therefore, not an equivalence relation. The transitivity requirement turns out to be too strong for this example. The problem of identiﬁcation and transitivity in the context of similarity of physical objects was early pointed out and discussed philosophically by Poincar´ (1902) and Poincar´ (1904). In the framework of fuzzy e e logic, the way to overcome this problem is to model similarity by fuzzy relations based on a many¨ valued concept of transitivity, see Bodenhofer (2003), H ohle (1993), H¨ hle (1999), Klement et al. o (2000) and Zadeh (1971). 2606 G ENERATING K ERNELS BY F UZZY R ELATIONS Deﬁnition 5 A function E : X 2 −→ [0, 1] is called a fuzzy equivalence relation, or synonymously, T -equivalence with respect to the t-norm T if it satisﬁes the following conditions: (i) ∀x ∈ X : E(x, x) = 1 (reﬂexivity) (ii) ∀x, y ∈ X : E(x, y) = E(y, x) (symmetry) (iii) ∀x, y, z ∈ X : T (E(x, y), E(y, z)) ≤ E(x, z) (T-transitivity). The value E(x, y) can be also looked at as the (quasi) truth value of the statement “x is equal to y”. Following this semantics, T-transitivity can be seen as a many-valued model of the proposition, “If x is equal to y and y is equal to z, then x is equal to z”. T -equivalences for Archimedean t-norms are closely related to metrics and pseudo-metrics as shown by Klement et al. (2000) and Moser (1995). Theorem 6 Let T be an Archimedean t-norm given by ∀a, b ∈ [0, 1] : T (a, b) = f −1 (min( f (a) + f (b), f (0))), where f : [0, 1] → [0, ∞] is a strictly decreasing, continuous function with f (1) = 0. (i) If d : X 2 → [0, ∞[ is a pseudo-metric, then the function Ed : X 2 → [0, 1] deﬁned by Ed (x, y) = f −1 (min(d(x, y), f (0))) is a T -equivalence with respect to the t-norm T . (ii) If E : X 2 → [0, 1] is a T -equivalence relation, then the function dE : X 2 → [0, ∞] deﬁned by dE (x, y) = f (E(x, y)) is a pseudo-metric. → Another way to construct T -equivalences is to employ T -operators. The proof of the following assertion can be found in Trillas and Valverde (1984), Kruse et al. (1993) and Kruse et al. (1994). ↔ Theorem 7 Let T be a left-continuous t-norm, T its induced biimplication, µi : X → [0, 1], i ∈ I, I non-empty; then E : X × X → [0, 1] given by ↔ E(x, y) = inf T (µi (x), µi (y)) i∈I (5) is a T -equivalence relation. ¨ For further details on T -equivalences see also Boixader and Jacas (1999), H oppner et al. (2002), Jacas (1988), Trillas et al. (1999) and Valverde (1985). 3. Representing Kernels by T -Equivalences It is interesting that the concept of kernels, which is motivated by geometric reasoning in terms of inner products and mappings to Hilbert spaces and which is inherently formulated by algebraic terms, is closely related to the concept of fuzzy equivalence relations as demonstrated and discussed in more detail in Moser (2006). In this section, we start with the result that any kernel k : X × X → [0, 1] with k(x, x) = 1 for all x ∈ X is T -transitive and, therefore, a fuzzy equivalence relation. The proof can be found in Moser (2006), see also Appendix A.1. 2607 M OSER Theorem 8 Any kernel k : X × X → [0, 1] with k(x, x) = 1 is (at least) Tcos -transitive, where 1 − a2 Tcos (a, b) = max{a b − 1 − b2 , 0}. (6) The nomenclature is motivated by the fact that the triangular norm deﬁned by Equation (6) is an Archimedean t-norm which is generated by the arcosine function as its additive generator. From this result, the following existence theorem can be derived, which guarantees that any kernel under consideration can be represented by the fuzzy-logical formula given by (5). In fuzzy systems, this formula is commonly used for modeling rule bases (see, for example, Kruse et al., 1993, 1994). Theorem 9 Let X be a non-empty universe of discourse, k : X × X → [0, 1] a kernel in the sense of Deﬁnition 1 and k(x, x) = 1 for all x ∈ X ; then there is a family of membership functions µ i : X → [0, 1], i ∈ I, I non-empty and a t-norm T , such that ↔ ∀x, y ∈ X : k(x, y) = inf T (µi (x), µi (y)). i∈I (7) Proof. Let us set I := X , µx0 (x) = k(x, x0 ) and let us choose Tcos as t-norm. For convenience let us denote ↔ h(x, y) = inf T cos (µx0 (x), µx0 (y)), x0 ∈X which is equivalent to ↔ h(x, y) = inf T cos (k(x0 , x), k(x0 , y)). x0 ∈X According to Theorem 8, k is Tcos -transitive, that is, ↔ ∀x0 , x, y ∈ X : T cos (k(x0 , x), k(x0 , y)) ≤ k(x, y). This implies that h(x, y) ≤ k(x, y) for all x, y ∈ X . Now let us consider the other inequality. Due to the adjunction property (3), we obtain → Tcos (k(x, y), k(x0 , y)) ≤ k(x, x0 ) ⇔ k(x, y) ≤ T cos (k(x0 , y), k(x, x0 )) and → Tcos (k(x, y), k(x0 , x)) ≤ k(y, x0 ) ⇔ k(x, y) ≤ T cos (k(x0 , x), k(y, x0 )), from which it follows that → → ∀x, y, x0 ∈ X : k(x, y) ≤ min{ T cos (k(x0 , y), k(x, x0 )), T cos (k(x0 , x), k(y, x0 ))}. Hence by Deﬁnition 4, ∀x, y ∈ X : k(x, y) ≤ h(x, y) which ends the proof. For an arbitrary choice of fuzzy membership functions, there is no necessity that the resulting relation (7) implies positive-semideﬁniteness and, therefore, a kernel. For an example of a Tcos equivalence which is not a kernel see Appendix A.4. Theorem 9 guarantees only the existence of a representation of the form (5) but it does not tell us how to construct the membership functions µ i . In the following section, we provide examples of fuzzy equivalence relations which yield kernels for any choice of membership functions. 2608 G ENERATING K ERNELS BY F UZZY R ELATIONS 4. Constructing Kernels by Fuzzy Equivalence Relations In the Boolean case, positive-deﬁniteness and equivalence are synonymous, that is, a Boolean relation R : X × X → {0, 1} is positive-deﬁnite if and only if R is the indicator function of an equivalence relation ∼ that is, R(x, y) = 1 if x ∼ y and R(x, y) = 0 if x ∼ y. For a proof, see Appendix A.2. This = = =, relationship can be used to obtain an extension to fuzzy relations as given by the next theorem whose proof can be found in the Appendix A.3. Theorem 10 Let X be a non-empty universe of discourse, µ i : X → [0, 1], i ∈ I, I non-empty; then the fuzzy equivalence relation EM : X × X → [0, 1] given by ↔ EM (x, y) = inf T M (µi (x), µi (y)) i∈I is positive-semideﬁnite. In the following, the most prominent representatives of Archimedean t-norms, the Product TP and the Łukasiewicz t-norm TL , are used to construct positive-semideﬁnite fuzzy similarity relations. Though the ﬁrst part can also be derived from a result due to Yaglom (1957) that characterizes isotropic stationary kernels by its spectral representation, here we prefer to present a direct, elementary proof. Compare also Bochner (1955) and Genton (2001). Theorem 11 Let X be a non-empty universe of discourse, ν : X → [0, 1] and let h : [0, 1] → [0, 1] be an isomorphism of the unit interval that can be expanded in the manner of Equation (1), that is h(x) = ∑k ck xk with ck ≥ 0; then the fuzzy equivalence relations EL,h , EP,h : X × X → [0, 1] given by ↔ EL,h (x, y) = h T L h−1 (ν(x)) , h−1 (ν(y)) and ↔ EP,h (x, y) = h T P h−1 (ν(x)) , h−1 (ν(y)) (8) (9) are positive-semideﬁnite. Proof. To prove the positive-deﬁniteness of the two-placed functions E L,h and EP,h given by equations (8) and (9) respectively, we have to show that n n ∑ i, j=1 EL,h (xi , xi ) ci c j ≥ 0, ∑ i, j=1 EP,h (xi , x j ) ci c j ≥ 0 for any n ∈ N and any choice of x1 , . . . , xn ∈ X , respectively. According to an elementary result from Linear Algebra this is equivalent to the assertion that the determinants (1 ≤ m ≤ n) Dm = det (E(xi , x j ))i, j∈{1,...,m} of the minors of the matrix (E(xi , x j ))i, j satisfy ∀m ∈ {1, . . . , n} : Dm ≥ 0, where E denotes either EL,h or EP,h . Recall that the determinant of a matrix is invariant with respect to renaming the indices, that is, if σ : {1, . . . , n} → {1, . . . , n} is a permutation then det [(ai j )i, j ] = det (aσ(i)σ( j) )i, j . 2609 M OSER For convenience, let denote µi = h−1 (ν(xi )). Then, without loss of generality, we may assume that the values µi are ordered monotonically decreasing, that is, µi ≥ µ j for i < j. ↔ → (10) → Case TL : Note that T L (a, b) = min{ T L (a, b), T L (b, a)} = 1 − |a − b|. Then we have to show that for all dimensions n ∈ N, the determinant of E (n) = (1 − |µi − µ j |)i, j∈{1,...,n} is non-negative, that is Due to the assumption (10), we have det[E (n) ] ≥ 0. 1 − |µi − µ j | = 1 − (µi − µ j ) if i ≤ j, 1 − (µ j − µi ) else which yields   . . . 1 − (µ1 − µn−1 ) 1 − (µ1 − µn )  . . . 1 − (µ2 − µn−1 ) 1 − (µ2 − µn )     . . . 1 − (µ3 − µn−1 ) 1 − (µ3 − µn )    (n) E = . . . .. . .   . . .   1 − (µ1 − µn−1 ) 1 − (µ2 − µn−1 ) . . . 1 1 − (µn−1 − µn ) 1 − (µ1 − µn ) 1 − (µ2 − µn ) . . . 1 − (µn−1 − µn ) 1 1 − (µ1 − µ2 ) 1 1 − (µ2 − µ3 ) . . . 1 1 − (µ1 − µ2 ) 1 − (µ1 − µ3 ) . . . Now let us apply determinant-invariant elementary column operations to simplify this matrix by subtracting the column with index i − 1 from the column with index i, i ≥ 2. This yields   1 µ2 − µ1 ... µn−1 − µn−2 µn − µn−1  1 − (µ1 − µ2 ) −(µ2 − µ1 ) . . . µn−1 − µn−2 µn − µn−1     1 − (µ1 − µ3 ) −(µ2 − µ1 ) . . . µn−1 − µn−2 µn − µn−1    ˜ E (n) =  . . . . . .. . . . .   . . . . .   1 − (µ1 − µn−1 ) −(µ2 − µ1 ) . . . −(µn−2 − µn−1 ) µn − µn−1  1 − (µ1 − µn ) −(µ2 − µ1 ) . . . −(µn−2 − µn−1 ) −(µn−1 − µn ) Therefore, α = n ∏(µi−1 − µi ) ≥ 0 (11) i=2 ˜ ˆ det[E (n) ] = det[E (n) ] = α det[En ], where   . . . −1 −1  . . . −1 −1    . . . −1 −1   (n) ˆ E = . . . .. . .   . . .   1 − (µ1 − µn−1 ) +1 . . . +1 −1 1 − (µ1 − µn ) +1 . . . +1 +1 1 1 − (µ1 − µ2 ) 1 − (µ1 − µ3 ) . . . 2610 −1 +1 +1 . . . (12) G ENERATING K ERNELS BY F UZZY R ELATIONS Let us apply Laplacian determinant expansion by minors to the ﬁrst column of matrix (12), that is n det[A] = ∑ (−1)i+ j ai j det[Ai j ] i=1 where A = (ai j ) is an n × n-matrix, j arbitrarily chosen from {1, . . . , n} and Ai j is the matrix corresponding to the cofactor ai j obtained by canceling out the i-th row and the j-th column from A (see, ˆ for example, Muir, 1960). For n = 1, we get the trivial case det[ E (1) ] = 1. Note that the ﬁrst and (n) ˆ the last rows of the matrices Ei,1 for 1 < i < n only differ by their signum, consequently the minors ˆ (n) det[Ei,1 ] for 1 < i < n, n ≥ 2, are vanishing, that is, det[Ai,1 ] = 0, for 1 < i < n. Therefore, according to the Laplacian expansion, we get (n) (n) ˆ ˆ ˆ det[E (n) ] = 1 · det[E1,1 ] + (−1)n (1 − (µ1 − µn )) · det[E1,n ]. (13) Observe that (n) ˆ det[E1,1 ] = 2n−2 (n) ˆ det[E1,n ] = (−1)n−1 2n−2 . Consequently, Equation (13) simpliﬁes to ˆ det[E (n) ] = 2n−2 1 + (−1)n (−1)n−1 2n−2 (1 − (µ1 − µn )) = 2n−2 (1 − (1 − (µ1 − µn ))) = 2n−2 (µ1 − µn ) ≥ 0 which together with (11) proves the ﬁrst case. ↔ → → Case TP : First of all, let us compute T P (a, b) = min{ T P (a, b), T L (b, a)}. Hence,  min{ b , a } if a, b > 0,  a b   0 ↔ if a = 0 and b > 0 , T P (a, b) = 0 if b = 0 and a > 0 ,    1 if a = 0 and b = 0 . Again, without loss of generality, let us suppose that the values µ i , i ∈ {1, . . . , n} are ordered monotonically decreasing, that is µ1 ≥ µ2 ≥ . . . ≥ µn . Before checking the general case, let us consider the special case of vanishing µ-values. For this, let us assume for the moment that µi = > 0 if i < i0 , 0 else ↔ ↔ which implies that T P (µi , µ j ) = 0 for i < i0 and j ≥ i0 and T P (µi , µ j ) = 1 for i ≥ i0 and j ≥ i0 . This leads to a decomposition of the matrix ↔ E (n) = T P (µi , µ j ) 2611 ij M OSER such that det[E (n) ] = det[E (i0 −1) ] · det[In−i0 −1 ] where Ik denotes the k × k-matrix with constant entries 1, hence det[In−i0 −1 ] ∈ {0, 1}. Therefore, we may assume that µ1 ≥ µ2 ≥ . . . ≥ µn > 0. Then we have to show that for all dimensions n ∈ N, the determinant of µi µ j , µ j µi E (n) = min i, j∈{1,...,n} is non-negative, that is det[E (n) ] ≥ 0. Consider  1  µ2  µ1  µ3  µ (n) E =  .1  .  .  µn−1  µ1 µn µ1 µ2 µ1 1 µ3 µ2 . . . µn−1 µ2 µn µ2 ... ... ... .. . ... ... Now, multiply the i-th column by −µi+1 /µi and add 1 ≤ i < n, then we get  1 0 ... 2  µ2 ∗ 1 − ... µ1  ∗ ∗ ...   . ˜ .. E (n) =  . . . . . .  ∗ ∗ ... 1−   ∗ ∗ ... µn−1 µ1 µn−1 µ2 µn−1 µ3 µn µ1 µn µ2 µn µ3      . . .  .  µn  µn−1  1 . . . 1 µn µn−1 (14) it to the (i + 1)-th column of matrix (14), 0 0 0 0 0 . . . 0 . . . µn−1 µn−2 2 ∗ 0 1− µn µn−1             2 (15) where ∗ is a placeholder for any real value. By this, the determinant of the matrix in Equation (15) readily turns out to be n−1 µi+1 ˜ det[E (n) ] = det[E (n) ] = ∏ 1 − µi i=1 2 ≥0 which together with Theorem (2) ends the proof. Note that relations (8) and (9) are T -transitive with respect to the corresponding isomorphic Archimedean t-norms, TL,h (x, y) = h(TL (h−1 (x), h−1 (x))) and TP,h (x, y) = h(TP (h−1 (x), h−1 (x))), respectively. 2612 G ENERATING K ERNELS BY F UZZY R ELATIONS Corollary 12 Let X be a non-empty universe of discourse, µ i : X → [0, 1], λi ∈ ]0, 1] with ∑i λi = 1 ˜ ˜ where i ∈ {1, . . . , n}, n ∈ N, then the fuzzy equivalence relations EL , EP : X × X → [0, 1] given by n ↔ ˜ EL (x, y) = ∑ λi T L (µi (x), µi (y)) (16) i=1 and n ↔ ˜ EP (x, y) = ∏ T P (µi (x), µi (y)) λi (17) i=1 are TL - and TP -equivalences, respectively, and kernels. Proof. First of all, let us check the TL -transitivity of formula (16). This can readily be shown by ↔ means of the deﬁnition of TL and the TL -transitivity of T L due to the following inequalities: n TL n ↔ i=1 n n ↔ ↔ n ↔ ↔ ∑ λi T L (µi (x), µi (y)) + ∑ λi T L (µi (y), µi (z)) − 1 , 0 i=1 i=1 n max = i=1 i=1 n = i=1 ∑ λi T L (µi (x), µi (y)) + ∑ λi T L (µi (y), µi (z)) − 1, 0 max max ↔ ∑ λi T L (µi (x), µi (y)), ∑ λi T L (µi (y), µi (yz) n ↔ ↔ ∑ λi TL T L (µi (x), µi (y)), ∑ λi T L (µi (y), µi (z)) , 0 i=1 i=1 n max ↔ ∑ λi T L (µi (x), µi (z)), 0 ≤ ≤ = i=1 ↔ λi T L (µi (x), µi (z)). ↔ This, together with the TP -transitivity of T P , proves that the formulas given by (16) and (17) are TL and TP -equivalences, respectively. Expanding the factors of formula (17) yields  1 if µi (x) = µi (y) = 0, λi ↔ λi λi (18) T P (µi (x), µi (y)) =  min(µiλi(x),µiλi(y)) else max(µi (x),µi (y)) which by comparing case TP of the proof of Theorem 11 shows that the left-hand side of Equation (18) is positive-semideﬁnite. As the convex combination and the product are special cases of positive-semideﬁniteness preserving functions according to Theorem 1, the functions deﬁned by equations (16) and (17) prove to be again positive-semideﬁnite and, therefore, kernels. It is interesting to observe that both formulas (16) and (17) can be expressed in the form, f ( τ(x) − τ(y) 1 ), where f : I → [0, 1], I some interval, is a strictly decreasing function, τ : X → I n , I some interval, τ(x) = (τ1 (x), . . . , τn (x)) and τ(x) 1 = ∑n |τi (x)|. Indeed, for Equation (16) let us deﬁne i=1 fL : [0, 1] → [0, 1], fL (a) = 1 − a τL : X → [0, 1] , τL (x) = (λ1 µ1 (x), . . . , λn µn (x)) n 2613 M OSER and for Equation (17) and positive membership functions µ i , µi (x) > 0 for all x ∈ X , let us deﬁne fP : [0, ∞[→ [0, 1], fP (a) = e−a τP : X → ] − ∞, 1]n , τP (x) = (λ1 ln(µ1 (x)), . . . , λn ln(µn (x))) Therefore, we get ˜ EL (x, y) = 1 − τL (x) − τL (y) ˜ EP (x, y) = e− τP (x)−τP (y) 1 . 1 (19) (20) While formulas (19) and (20) provide a geometrical interpretation by means of the norm . 1 , the corresponding formulas (16) and (17) yield a semantical model of the assertion “IF x is equal to y with respect to feature µ1 AND . . . AND x is equal to y with respect to feature µn THEN x is equal to y” as aggregation of biimplications in terms of fuzzy logic. While in the former case, the aggregation has some compensatory effect, the latter is just a conjunction in terms of the Product triangular norm. For details on aggregation operators see, for example, Saminger et al. (2002) and Calvo et al. (2002). The formulas (16) and (17) coincide for the following special case. If the membership functions µi are indicator functions of sets Ai ⊆ X which form a partition of X , then the kernels (16) and (17) reduce to the indicator function characterizing the Boolean equivalence relation induced by this partition {A1 , . . . , An }. The formulas (16) and (17) for general membership functions therefore provide kernels which can be interpreted to be induced by a family of fuzzy sets and, in particular, by fuzzy partitions, that is, families of fuzzy sets fulﬁlling some criteria which extend the axioms for a Boolean partition in a many-valued logical sense. For deﬁnitions and further details on fuzzy partitions see, for ¨ example, De Baets and Mesiar (1998), Demirci (2003) and H oppner and Klawonn (2003). It is a frequently used paradigm that the decision boundaries for a classiﬁcation problem lie between clusters rather than intersecting them. Due to this cluster hypothesis, the problem of designing kernels based on fuzzy partitions is closely related to the problem of learning kernels from unlabeled data. For further details on semi-supervised learning see, for example, Seeger (2002), Chapelle et al. (2003) and T. M. Huang (2006). It is left to future research to explore this relationship to the problem of learning from labeled and unlabeled data and related concepts like covariance kernels. 5. Conclusion In this paper, we have presented a novel view on kernels from a fuzzy logical point of view. Particularly, the similarity-measure aspect of a kernel is addressed and investigated by means of the so-called T -transitivity which is characteristic for fuzzy equivalence relations. As a consequence, we derived that a large class of kernels can be represented in a way that is commonly used for representing fuzzy rule bases. In addition to this proof for the existence of such a representation, constructive examples are presented. It is the idea of this research to look for a combination of knowledge-based strategies with kernel-based methods in order to facilitate a more ﬂexible designing process of kernels which also allows to incorporate prior knowledge. Further research aims at 2614 G ENERATING K ERNELS BY F UZZY R ELATIONS analyzing the behavior of kernels constructed in this way when applied in the various kernel methods like support vector machines, kernel principal components analysis and others. In particular, it is intended to focus on the problem of learning kernels from unlabeled data where the fuzzy partitions are induced by appropriate clustering principles. Acknowledgments Bernhard Moser gratefully acknowledges partial support by the Austrian Government, the State of Upper Austria, and the Johannes Kepler University Linz in the framework of the Kplus Competence Center Program. Furthermore special thanks go to the anonymous reviewers who gave helpful suggestions and to Felix Kossak for careful proof-reading. Appendix A. For sake of completeness the following sections provide proofs regarding Theorem 8, the characterization of kernels in the Boolean case and the construction of kernels by means of the minimum t-norm TM . Furthermore, in Section A.4 an example of a non-positive-semideﬁnite Tcos -equivalence is given. A.1 Proof of Theorem 8 Let us start with the analysis of 3-dimensional matrices. Lemma 13 Let M = (mi j )i j ∈ [0, 1]3×3 be a 3 × 3 symmetric matrix with mii = 1, i = 1, 2, 3; then M is positive-semideﬁnite iff for all i, j, k ∈ {1, 2, 3} there holds mi j m jk − 1 − m2j i 1 − m2 ≤ mik jk Proof. For simplicity, let a = m1,2 , b = m1,3 and c = m2,3 . Then the determinant of M, Det(M), is a function of the variables a, b, c given by D(a, b, c) = 1 + 2abc − a2 − b2 − c2 . For any choice of a, b, the quadratic equation D(a, b, c) = 0 can be solved for c, yielding two solutions c1 = c1 (a, b) and c2 = c2 (a, b) as functions of a and b, c1 (a, b) = ab − c2 (a, b) = ab + 1 − a2 1 − a2 1 − b2 1 − b2 . Obviously, for all |a| ≤ 1 and |b| ≤ 1, the values c1 (a, b) and c2 (a, b) are real. By substituting a = cos α and b = cos(β) with α, β ∈ [0, π ], it becomes readily clear that 2 c1 (a, b) = c1 (cos(α), cos(β)) = cos(α) cos(β) − sin(α) sin(β) = cos(α + β) ∈ [−1, 1] 2615 M OSER and, analogously, c2 (a, b) = c2 (cos(α), cos(β)) = cos(α) cos(β) + sin(α) sin(β) = cos(α − β) ∈ [−1, 1]. As for all a, b ∈ [−1, 1] the determinant function Da,b (c) := D(a, b, c) is quadratic in c with negative coefﬁcient for c2 , there is a uniquely determined maximum at c0 (a, b) = ab. Note that for all a, b ∈ [−1, 1], we have c1 (a, b) ≤ c0 (a, b) ≤ c2 (a, b) and D(a, b, c0 (a, b)) = 1 + 2ab(ab) − a2 − b2 − (ab)2 = (1 − a2 )(1 − b2 ) ≥ 0. Therefore, D(a, b, c) ≥ 0 if and only if c ∈ [c1 (a, b), c2 (a, b)]. Recall from linear algebra that by renaming the indices, the determinant does not change. Therefore, without loss of generality, we may assume that a ≥ b ≥ c. For convenience, let Q = {(x, y, z) ∈ [0, 1]3 |x ≥ y ≥ z}. Then, obviously, for any choice of a, b ∈ [0, 1] there holds (a, b, c1 (a, b)) ∈ Q. Elementary algebra shows that (a, b, c2 (a, b)) ∈ Q is only the case for a = b = 1. As for a = b = 1 the two solutions c1 , c2 coincide, that is, c1 (1, 1) = c2 (1, 1) = 1, it follows that for any choice of (a, b, c) ∈ Q, there holds D(a, b, c) ≥ 0 if and only if c1 (a, b) = ab − 1 − a2 1 − b2 ≤ c. (21) If (a, b, c) ∈ Q, then the inequality (21) is trivially satisﬁed which together with (21) proves the lemma Now Theorem 8 immediately follows from Deﬁnition (1), Lemma (13) and the characterizing inequality (21). A.2 Characterization of Kernels in the Boolean Case ¨ The following lemma and proposition can also be found as an exercise in Sch olkopf and Smola (2002). Lemma 14 Let ∼ be an equivalence relation on X and let k : X × X → {0, 1} be induced by ∼ via k(x, y) = 1 if and only if x ∼ y; then k is a kernel. Proof. By deﬁnition of positive-deﬁniteness, let us consider an arbitrary sequence of elements x1 , . . . , xn . Then there are at most n equivalence classes Q1 , . . . , Qm on the set of indices {1, . . . , n}, S / m ≤ n, where i=1,...,m Qi = {1, . . . , n} and Qi ∩ Q j = 0 for i = j. Note that k(xi , x j ) = 0 if the indices 2616 G ENERATING K ERNELS BY F UZZY R ELATIONS i, j belong to different equivalence classes. Then, for any choice of reals c 1 , . . . , cn , we obtain ∑ ci c j k(xi , x j ) m = i, j ∑ ∑ ci c j k(xi , x j ) p=1 i, j∈Q p m = ∑ ∑ p=1 i, j∈Q p ci c j · 1 2 m = ∑ ∑ ci p=1 i∈Q p ≥ 0 Proposition 15 k : X × X → {0, 1} with k(x, x) = 1 for all x ∈ X is a kernel if and only if it is induced by an equivalence relation. Proof. It only remains to be shown that if k is a kernel, then it is the indicator function of an equivalence relation, that is, it is induced by an equivalence relation. If k is a kernel, according to Lemma 13, for all x, y, z ∈ X , it has to satisfy Tcos (k(x, y), k(y, z)) ≤ k(x, z), which implies, k(x, y) = 1, k(y, z) = 1 =⇒ k(x, z) = 1. Obviously, we have k(x, x) = 1 and k(x, y) = k(y, x) due to the reﬂexivity and symmetry assumption of k, respectively. A.3 Constructing Kernels by TM For convenience let us recall the basic notion of an α-cut from fuzzy set theory: Deﬁnition 16 Let X be a non-empty set and µ : X → [0, 1]; then [µ]α = {x ∈ X | µ(x) ≥ α} is called the α-cut of the membership function µ. Lemma 17 k : X × X → [0, 1] is a TM -equivalence if and only if all α-cuts of k are Boolean equivalence relations. Proof. (i) Let us assume that k is a TM -equivalence. Let α ∈ [0, 1], then by deﬁnition, [k]α = {(x, y) ∈ X × X | k(x, y) ≥ α}. In order to show that [k]α is a Boolean equivalence, the axioms for reﬂexivity, symmetry and transitivity have to be shown. Reﬂexivity and symmetry are trivially satisﬁed as for all x, y ∈ X , there holds by assumption that k(x, x) = 1 and k(x, y) = k(y, x). In order to show transitivity, let us consider (x, y), (y, z) ∈ [k]α , that means k(x, y) ≥ α and k(y, z) ≥ α; then by the TM -transitivity assumption it follows that α ≤ min(k(x, y), k(y, z)) ≤ k(x, z), hence (x, z) ∈ [k]α . 2617 M OSER (ii) Suppose now that all α-cuts of k are Boolean equivalence relations. Then, in particular, [k] α with α = 1 is reﬂexive, hence k(x, x) = 1 for all x ∈ X . The symmetry of k follows from the fact that for all α ∈ [0, 1] and pairs (x, y) ∈ [k]α , by assumption, we have (y, x) ∈ [k]α . In order to show the TM -transitivity property, let us consider arbitrarily chosen elements x, y, z ∈ X . Let α = min(k(x, y), k(y, z)); then by the transitivity assumption of [k] α , it follows that (x, z) ∈ [k]α , consequently k(x, z) ≥ α = min(k(x, y), k(y, z)). Proposition 18 If k : X × X → [0, 1] is a TM -equivalence then it is positive-semideﬁnite. Proof. Choose arbitrary elements x1 , . . . , xn ∈ X and consider the set of values which are taken by all combinations k(xi , x j ), i, j ∈ {1, . . . , n} and order them increasingly, that is k(xi , x j )| i, j ∈ {1, . . . , n}} = {α1 , . . . , αm , where 0 ≤ α1 ≤ · · · αm ≤ 1. Observe that for all pairs (xi , x j ), i, j ∈ {1, . . . , n} there holds m k(xi , x j ) = ∑ (αv − αv−1 )1[k] αv v=2 (xi , x j ) + α1 1[k]α1 (xi , x j ) showing that on the set {x1 , . . . , xn } × {x1 , . . . , xn }, the function k is a linear combination of indicator functions of Boolean equivalences (which are positive-semideﬁnite by Proposition 15) with nonnegative coefﬁcients and, consequently, it has to be positive semideﬁnite. A.4 Example of a Non-Positive-Semideﬁnite Tcos -Equivalence For dimensions n > 3, the Tcos -transitivity is no longer sufﬁcient to guarantee positive semi(n) deﬁniteness. Consider, for example An = (ai j )i j where  λ  (n) ai j = 1   0 if min(i, j) = 1, max(i, j) > 1 , if i = j, else . (22) √ (n) (n) (n) Choose λ = 1/ 2, then Tcos (λ, λ) = 0, hence we have Tcos (ai j , a jk ) ≤ aik for all indices i, j, k ∈ 1, . . . , n. As det(An ) < 0 for n > 3, the matrix An cannot be positive-semideﬁnite though the Tcos transitivity conditions are satisﬁed. References S. Bochner. Harmonic Analysis and the Theory of Probability. University of California Press, Los Angeles, California, 1955. U. Bodenhofer. A note on approximate equality versus the Poincar´ paradox. Fuzzy Sets and e Systems, 133(2):155–160, 2003. 2618 G ENERATING K ERNELS BY F UZZY R ELATIONS D. Boixader and J. Jacas. T -indistinguishability operators and approximate reasoning via CRI. In D. Dubois, E. P. Klement, and H. Prade, editors, Fuzzy Sets, Logics and Reasoning about Knowledge, volume 15 of Applied Logic Series, pages 255–268. Kluwer Academic Publishers, Dordrecht, 1999. A. Pinkus C. H. FitzGerald, C.A. Micchelli. Functions that preserve families of positive semideﬁnite matrices. Linear Alg. and Appl., 221:83–102, 1995. T. Calvo, G. Mayor, and R. Mesiar, editors. Aggregation Operators, volume 97 of Studies in Fuzziness and Soft Computing. Physica-Verlag, Heidelberg, 2002. ¨ O. Chapelle, J. Weston, and B. Scholkopf. Cluster kernels for semi-supervised learning. volume 15 of NIPS. 2003. B. De Baets and R. Mesiar. T -partitions. Fuzzy Sets and Systems, 97:211–223, 1998. M. Demirci. On many-valued partitions and many-valued equivalence relations. Internat. J. Uncertain. Fuzziness Knowledge-Based Systems, 11(2):235–253, 2003. D. Dubois and H. Prade. A review of fuzzy set aggregation connectives. Inform. Sci., 36:85–121, 1985. M. G. Genton. Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2:299–312, 2001. S. Gottwald. Fuzzy set theory with t-norms and Φ-operators. In A. Di Nola and A. G. S. Ventre, editors, The Mathematics of Fuzzy Systems, volume 88 of Interdisciplinary Systems Research, ¨ pages 143–195. Verlag TUV Rheinland, K¨ ln, 1986. o S. Gottwald. Fuzzy Sets and Fuzzy Logic. Vieweg, Braunschweig, 1993. U. H¨ hle. Fuzzy equalities and indistinguishability. In Proc. 1st European Congress on Fuzzy and o Intelligent Technologies, volume 1, pages 358–363, Aachen, 1993. U. H¨ hle. The Poincar´ paradox and non-classical logics. In D. Dubois, E. P. Klement, and H. Prade, o e editors, Fuzzy Sets, Logics and Reasoning about Knowledge, volume 15 of Applied Logic Series, pages 7–16. Kluwer Academic Publishers, Dordrecht, 1999. F. H¨ ppner and F. Klawonn. Improved fuzzy partitions for fuzzy regression models. Internat. J. o Approx. Reason., 32:85–102, 2003. F. H¨ ppner, F. Klawonn, and P. Eklund. Learning indistinguishability from data. Soft Computing, 6 o (1):6–13, 2002. J. Jacas. On the generators of T -indistinguishability operators. Stochastica, 12:49–63, 1988. I. T. Jolliffe. Principal Component Analysis. Springer Verlag, New York, 1986. E. P. Klement, R. Mesiar, and E. Pap. Triangular Norms, volume 8 of Trends in Logic. Kluwer Academic Publishers, Dordrecht, 2000. 2619 M OSER R. Kruse, J. Gebhardt, and F. Klawonn. Fuzzy-Systeme. B. G. Teubner, Stuttgart, 1993. R. Kruse, J. Gebhardt, and F. Klawonn. Foundations of Fuzzy Systems. John Wiley & Sons, New York, 1994. C. H. Ling. Representation of associative functions. Publ. Math. Debrecen, 12:189–212, 1965. B. Moser. On the t-transitivity of kernels. Fuzzy Sets and Systems, 157:1787–1796, 2006. B. Moser. A New Approach for Representing Control Surfaces by Fuzzy Rule Bases. PhD thesis, Johannes Kepler Universit¨ t Linz, October 1995. a T. Muir. A Treatise on the Theory of Determinants. Dover, New York, 1960. H. Poincar´ . La Science et l’Hypoth´ se. Flammarion, Paris, 1902. e e H. Poincar´ . La Valeur de la Science. Flammarion, Paris, 1904. e S. Saminger, R. Mesiar, and U. Bodenhofer. Domination of aggregation operators and preservation of transitivity. Internat. J. Uncertain. Fuzziness Knowledge-Based Systems, 10(Suppl.):11–35, 2002. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, 2002. o ¨ B. Sch¨ lkopf, A. J. Smola, and K. R. Muller. Nonlinear component analysis as a kernel eigenvalue o problem. Neural Computation, 10:1299–1319, 1998. B. Schweizer and A. Sklar. Associative functions and statistical triangle inequalities. Publ. Math. Debrecen, 8:169–186, 1961. B. Schweizer and A. Sklar. Probabilistic Metric Spaces. North-Holland, Amsterdam, 1983. M. Seeger. Covariance kernels from bayesian generative models. Neural Information Processing Systems, 14:905–912, 2002. I. Kopriva T. M. Huang, V. Kecman. Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semi-supervised, and Unsupervised Learning. Springer-Verlag, Berlin, 2006. E. Trillas and L. Valverde. An inquiry into indistinguishability operators. In H. J. Skala, S. Termini, and E. Trillas, editors, Aspects of Vagueness, pages 231–256. Reidel, Dordrecht, 1984. E. Trillas, S. Cubillo, and E. Casti˜ eira. Menger and Ovchinnikov on indistinguishabilities revisited. n Internat. J. Uncertain. Fuzziness Knowledge-Based Systems, 7(3):213–218, 1999. L. Valverde. On the structure of F-indistinguishability operators. Fuzzy Sets and Systems, 17(3): 313–328, 1985. A. M. Yaglom. Some classes of random ﬁelds in n-dimensional space, related to stationary random processes. Theory of Probability and its Applications, 2:273–320, 1957. L. A. Zadeh. Similarity relations and fuzzy orderings. Inform. Sci., 3:177–200, 1971. 2620

6 0.21980193 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models

7 0.19030236 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

8 0.17259836 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation

9 0.14830144 33 jmlr-2006-Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (Special Topic on Machine Learning and Optimization)

10 0.13678412 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

11 0.10935887 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

12 0.10686781 30 jmlr-2006-Evolutionary Function Approximation for Reinforcement Learning

13 0.10669819 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

14 0.10391799 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

15 0.10130651 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

16 0.099775881 66 jmlr-2006-On Model Selection Consistency of Lasso

17 0.096852645 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications (Special Topic on Machine Learning and Optimization)

18 0.091804534 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs (Special Topic on Machine Learning and Optimization)

19 0.090446889 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

20 0.088290691 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.012), (34, 0.475), (36, 0.056), (45, 0.016), (50, 0.093), (61, 0.012), (63, 0.033), (76, 0.049), (78, 0.021), (79, 0.012), (81, 0.031), (90, 0.024), (91, 0.023), (96, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76277262 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

Author: Alberto Roverato, Milan Studený

2 0.70484972 96 jmlr-2006-Worst-Case Analysis of Selective Sampling for Linear Classification

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Luca Zaniboni

Abstract: A selective sampling algorithm is a learning algorithm for classiﬁcation that, based on the past observed data, decides whether to ask the label of each new instance to be classiﬁed. In this paper, we introduce a general technique for turning linear-threshold classiﬁcation algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classiﬁcation, while observing in practice substantially fewer labels. Keywords: selective sampling, semi-supervised learning, on-line learning, kernel algorithms, linear-threshold classiﬁers

3 0.26088557 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

Author: Mikio L. Braun

Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to inﬁnity. We derive probabilistic ﬁnite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reﬂecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a signiﬁcant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds

4 0.24741426 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefﬁcients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation

5 0.24676713 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

Author: Magnus Ekdahl, Timo Koski

Abstract: In many pattern recognition/classiﬁcation problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classiﬁer is expected to have worse performance, here measured by the probability of correct classiﬁcation. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classiﬁcation when compared to the optimal classiﬁer. An example of an approximation is the Na¨ve Bayes classiﬁer. We show that the perforı mance of the Na¨ve Bayes depends on the degree of functional dependence between the features ı and labels. We provide a sufﬁcient condition for zero loss of performance, too. Keywords: Bayesian networks, na¨ve Bayes, plug-in classiﬁer, Kolmogorov distance of variation, ı variational learning

6 0.24672315 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

7 0.24555457 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

8 0.24402592 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity

9 0.24231419 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation

10 0.24208659 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

11 0.24063174 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

12 0.23919632 66 jmlr-2006-On Model Selection Consistency of Lasso

13 0.23914374 24 jmlr-2006-Consistency of Multiclass Empirical Risk Minimization Methods Based on Convex Loss

14 0.23539156 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms

15 0.23421903 82 jmlr-2006-Some Theory for Generalized Boosting Algorithms

16 0.23340692 53 jmlr-2006-Learning a Hidden Hypergraph

17 0.23174262 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

18 0.23091942 67 jmlr-2006-On Representing and Generating Kernels by Fuzzy Equivalence Relations

19 0.23027158 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

20 0.228921 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities