jmlr jmlr2007 jmlr2007-72 knowledge-graph by maker-knowledge-mining

72 jmlr-2007-Relational Dependency Networks


Source: pdf

Author: Jennifer Neville, David Jensen

Abstract: Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational data sets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. [sent-8, score-0.864]

2 We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. [sent-9, score-0.694]

3 We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. [sent-11, score-0.595]

4 In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. [sent-12, score-0.415]

5 Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation 1. [sent-13, score-0.879]

6 Instances in propositional data record the characteristics of homogeneous and statistically independent objects; instances in relational data record the characteristics of heterogeneous objects and the relations among those objects. [sent-15, score-0.381]

7 N EVILLE AND J ENSEN The presence of autocorrelation provides a strong motivation for using relational techniques for learning and inference. [sent-18, score-0.656]

8 Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets (Jensen and Neville, 2002). [sent-19, score-0.381]

9 More formally, we define relational autocorrelation with respect to an attributed graph G = (V, E), where each node v ∈ V represents an object and each edge e ∈ E represents a binary relation. [sent-21, score-0.71]

10 Recent analyses of relational data sets have reported autocorrelation in the following variables: • Topics of hyperlinked web pages (Chakrabarti et al. [sent-29, score-0.656]

11 When relational data exhibit autocorrelation there is a unique opportunity to improve model performance because inferences about one object can inform inferences about related objects. [sent-37, score-0.656]

12 Indeed, recent work in relational domains has shown that collective inference over an entire data set results in more accurate predictions than conditional inference for each instance independently (e. [sent-38, score-0.575]

13 , 1998; Neville and Jensen, 2000; Lu and Getoor, 2003), and that the gains over conditional models increase as autocorrelation increases (Jensen et al. [sent-41, score-0.392]

14 Joint relational models are able to exploit autocorrelation by estimating a joint probability distribution over an entire relational data set and collectively inferring the labels of related instances. [sent-43, score-1.076]

15 Recent research has produced several novel types of graphical models for estimating joint probability distributions for relational data that consist of non-independent and heterogeneous instances (e. [sent-44, score-0.42]

16 We will refer to these models as probabilistic relational models (PRMs). [sent-49, score-0.39]

17 , 2001) use the term probabilistic relational model to refer to a specific model that is now often called a relational Bayesian network [Koller, personal communication]. [sent-55, score-0.66]

18 , 2001), can model autocorrelation dependencies if they are structured in a manner that respects the acyclicity constraint of the model. [sent-60, score-0.489]

19 While domain knowledge can sometimes be used to structure the autocorrelation dependencies in an acyclic manner, often an acyclic ordering is unknown or does not exist. [sent-61, score-0.449]

20 The acyclicity constraint of directed PRMs precludes the learning of arbitrary autocorrelation dependencies and thus severely limits the applicability of these models in relational domains. [sent-68, score-0.895]

21 In this paper, we outline relational dependency networks (RDNs), 5 an extension of dependency networks (Heckerman et al. [sent-77, score-0.432]

22 RDNs can represent and reason with the cyclic dependencies required to express and exploit autocorrelation during collective inference. [sent-79, score-0.522]

23 In this regard, they share certain advantages of RMNs and other undirected models of relational data (Chakrabarti et al. [sent-80, score-0.389]

24 We use the term relational Bayesian network to refer to Bayesian networks that have been upgraded to model relational databases. [sent-93, score-0.66]

25 Of particular note, all the real-world data sets exhibit multiple autocorrelation dependencies that were automatically discovered by the RDN learning algorithm. [sent-109, score-0.449]

26 In addition, whereas the need for approximate inference is a disadvantage of DNs for propositional data, due to the complexity of relational model graphs in practice, all PRMs use approximate inference. [sent-205, score-0.415]

27 Relational dependency networks extend DNs to work with relational data in much the same way that RBNs extend Bayesian networks and RMNs extend Markov networks. [sent-206, score-0.381]

28 1 Probabilistic Relational Models PRMs represent a joint probability distribution over the attributes of a relational data set. [sent-213, score-0.503]

29 In contrast, there are three graphs associated with models of relational data: the data graph GD , the model graph GM , and the inference graph GI . [sent-215, score-0.607]

30 First, the relational data set is represented as a typed, attributed data graph G D = (VD , ED ). [sent-218, score-0.384]

31 Dependencies among attributes of the same object are represented by arcs within a rectangle; arcs that cross rectangle boundaries represent dependencies among attributes of related objects, with edge labels indicating the underlying relations. [sent-287, score-0.399]

32 , how much of the relational neighborhood around an item can influence the probability distribution of an item’s attributes). [sent-293, score-0.393]

33 Two common approaches to limiting search in the space of relational dependencies are: (1) exhaustive search of all dependencies within a fixed-distance neighborhood in GD (e. [sent-295, score-0.576]

34 For clarity, we omit cyclic autocorrelation dependencies in this example. [sent-306, score-0.483]

35 2 RDN Representation Relational dependency networks encode probabilistic relationships in a similar manner to DNs, extending the representation to a relational setting. [sent-317, score-0.381]

36 A consistent RDN specifies a joint probability distribution p(x) over the attribute values of a relational data set from which each CPD ∈ P can be derived using the rules of probability. [sent-325, score-0.442]

37 However, these techniques exploit the requirement that the CPDs factor the full distribution—a requirement that imposes acyclicity constraints on the model and precludes the learning of arbitrary autocorrelation dependencies. [sent-358, score-0.366]

38 On the other hand, it is possible for RMN techniques to learn cyclic autocorrelation dependencies in principle. [sent-359, score-0.483]

39 When modeling the joint distribution of relational data, the number of states is exponential in the number of attributes and the number of instances. [sent-365, score-0.503]

40 The pseudolikelihood for data graph G D is computed as a product over the item types t, the attributes of that type X t , and the items of that type v, e: PL(GD ; θ) = ∏ ∏ ∏ t∈T Xit ∈Xt v:T (v)=t p(xtvi |paxtvi ; θ) ∏ e:T (e)=t p(xtei |paxtei ; θ). [sent-371, score-0.439]

41 The RDN learning algorithm is similar to the DN learning algorithm, except we use a relational probability estimation algorithm to learn the set of conditional models, maximizing pseudolikelihood for each variable separately. [sent-419, score-0.534]

42 The algorithm input consists of: (1) G D : a relational data graph, (2) R: a conditional relational learner, and (3) Qt : a set of queries12 that specify the relational neighborhood considered in R for each type T . [sent-420, score-1.026]

43 This assumes the complexity of the relational learner R is O(|PAX | · N), which is true for the two relational learners considered in this paper. [sent-445, score-0.66]

44 664 R ELATIONAL D EPENDENCY N ETWORKS Learn RDN (GD , R, Qt ): / P←0 For each t ∈ T : t For each Xk ∈ Xt : t Use R to learn a CPD for Xk given the attributes in the relational neighborhood defined by Qt . [sent-446, score-0.443]

45 14 The query specifies match criteria for a target item (paper) and its local relational neighborhood (authors and references). [sent-464, score-0.393]

46 , people that have authored a paper are categorized as authors) and specifies the relevant relational context for the target item type in the model. [sent-470, score-0.393]

47 In principle, any conditional relational learner can be used as a subcomponent to learn the individual CPDs provided that it can closely approximate CPDs consistent with the joint distribution. [sent-483, score-0.426]

48 RBCs treat heterogeneous relational subgraphs as a homogenous set of attribute multisets. [sent-488, score-0.382]

49 For a given item type t ∈ T , the query scope specifies the set of item types T R that form the relevant relational neighborhood for t. [sent-495, score-0.456]

50 RPTs also treat heterogeneous relational subgraphs as a set of attribute multisets, but instead of modeling the multisets as independent values drawn from a multinomial, the RPT algorithm uses aggregation functions to map a set of values into a single feature value. [sent-505, score-0.382]

51 The RPT algorithm automatically constructs and searches over aggregated relational features to model the distribution of the target variable X on items of type t. [sent-508, score-0.371]

52 The RPT learning algorithm adjusts for biases towards particular features due to degree disparity and autocorrelation in relational data (Jensen and Neville, 2002, 2003). [sent-544, score-0.656]

53 Experiments The experiments in this section demonstrate the utility of RDNs as a joint model of relational data. [sent-615, score-0.39]

54 First, we use synthetic data to assess the impact of training-set size and autocorrelation on RDN learning and inference, showing that accurate models can be learned with reasonable data set sizes and that the model is robust to varying levels of autocorrelation. [sent-616, score-0.398]

55 If the pseudolikelihood given the learned parameters approaches the pseudolikelihood given the true parameters, then we can conclude that parameter estimation is successful. [sent-660, score-0.378]

56 We do not compare directly to RBNs because their acyclicity constraint prevents them from representing the autocorrelation dependencies in this domain. [sent-709, score-0.489]

57 Although RBNs and conditional models cannot represent the autocorrelation directly, they can exploit the autocorrelation indirectly by using the observed attributes of related instances. [sent-711, score-0.831]

58 672 R ELATIONAL D EPENDENCY N ETWORKS correlation between the words on a webpage and its topic, and the topics of hyperlinked pages are autocorrelated, then the models can exploit autocorrelation dependencies by modeling the contents of a webpage’s neighboring pages. [sent-716, score-0.479]

59 , RDNs) are a low-variance means of reducing bias through direct modeling of the autocorrelation dependencies (Jensen et al. [sent-719, score-0.449]

60 Models that exploit autocorrelation dependencies indirectly by modeling the observed attributes of related instances, experience a dramatic increase in variance as the number of observed attributes increases. [sent-721, score-0.675]

61 ceil RDNRPT performance is indistinguishable from that of RDNRPT except when autocorrelation is high and there are no labels to seed inference. [sent-730, score-0.388]

62 , X2 ) are the only information available to constrain the system during inference so the model cannot fully 673 N EVILLE AND J ENSEN exploit the autocorrelation dependencies. [sent-733, score-0.411]

63 Since the RBC is low-variance and there are only four attributes in our data sets, it is not surprising that the RBC is able to exploit autocorrelation to improve performance. [sent-749, score-0.439]

64 2 Empirical Data Experiments We learned RDNs for five real-world relational data sets. [sent-765, score-0.372]

65 In addition to dependencies among attribute values, relational learners may also learn dependencies between the structure of relations (edges in GD ) and attribute values. [sent-814, score-0.68]

66 Exploiting these types of autocorrelation dependencies has been shown to significantly improve classification accuracy of RMNs compared to RBNs, which cannot model cyclic dependencies (Taskar et al. [sent-842, score-0.606]

67 However, to exploit autocorrelation, RMNs must be instantiated with the appropriate clique templates—to date there is no RMN algorithm for learning autocorrelation dependencies. [sent-844, score-0.383]

68 Movie receipts and genre each exhibit a number of autocorrelation dependencies (through actors, directors, and studios), which illustrates the group structure of the Hollywood movie industry. [sent-869, score-0.476]

69 All of the attributes exhibit autocorrelation in the WebKB data. [sent-898, score-0.439]

70 2 P REDICTION We evaluated RDNs on prediction tasks in order to assess (1) whether autocorrelation dependencies among instances can be used to improve model accuracy, and (2) whether the RDNs, using Gibbs sampling, can effectively infer labels for a network of instances. [sent-901, score-0.449]

71 As mentioned previously, the RPT can sometimes use the observed attributes of related items to effectively reason with autocorrelation dependencies. [sent-927, score-0.48]

72 This indicates that autocorrelation dependencies are identified by the RDN but the model is unable to exploit those dependencies during Gibbs sampling. [sent-937, score-0.572]

73 This may indicate that RMN inference will produce biased (marginal) probability estimates when run in a “loopy” relational network, due to overfitting the clique weights. [sent-971, score-0.472]

74 Related Work There are three types of statistical relational models relevant to RDNs: probabilistic relational models, probabilistic logic models, and collective inference models. [sent-976, score-0.858]

75 1 Probabilistic Relational Models Probabilistic relational models are one class of models for density estimation in relational data sets. [sent-979, score-0.72]

76 Examples of PRMs include relational Bayesian networks and relational Markov networks. [sent-980, score-0.66]

77 In addition, instead of exhaustive search of the space of relational dependencies, the structure learning algorithm uses greedy iterative-deepening, expanding the search in directions where the dependencies improve the likelihood. [sent-997, score-0.453]

78 Also, because reasoning with relational models requires more space and computational resources, efficient learning techniques make relational modeling both practical and feasible. [sent-1003, score-0.69]

79 As mentioned in Section 1, the acyclicity requirement precludes the learning of arbitrary autocorrelation dependencies and limits the applicability of these models in relational domains. [sent-1005, score-0.849]

80 The PRM representation, which ties parameters across items of the same type, makes it difficult for RBNs to represent autocorrelation dependencies. [sent-1006, score-0.367]

81 Because autocorrelation is a dependency among the value of the same variable on linked items of the same type, the CPDs that represent autocorrelation will produce cycles in the rolled out inference graph. [sent-1007, score-0.874]

82 For example, when modeling the autocorrelation among the topics of two hyperlinked web pages P1 and P2 , the CPD for the topic of P1 will have the topic of P2 as a parent and vice versa. [sent-1008, score-0.38]

83 In general, unless there is causal knowledge of how to structure the order of autocorrelation dependencies (e. [sent-1009, score-0.449]

84 RMNs use relational clique templates CT to specify the ways in which cliques are defined in UM . [sent-1016, score-0.424]

85 Because the user supplies a set of relational dependencies to consider (i. [sent-1024, score-0.453]

86 This is particularly important for reasoning in relational data sets where autocorrelation dependencies are nearly ubiquitous and often cannot be structured in an acyclic manner. [sent-1028, score-0.779]

87 In large cyclic relational inference graphs, the cost of inference is prohibitively expensive—in particular, without approximations to increase efficiency, feature selection is intractable. [sent-1031, score-0.534]

88 In this sense, they share the same strengths and weaknesses as RMNs—they are capable of representing cyclic autocorrelation relationships but suffer from decreased efficiency if full joint estimation is used during learning. [sent-1046, score-0.42]

89 3 Collective Inference Collective inference models exploit autocorrelation dependencies in a network of objects to improve predictions. [sent-1053, score-0.615]

90 Joint relational models, such as those discussed above, are able to exploit autocorrelation to improve predictions by estimating joint probability distributions over the entire graph and collectively inferring the labels of related instances. [sent-1054, score-0.77]

91 In this work we have demonstrated that autocorrelation is the reason behind improved performance in collective inference (see Jensen et al. [sent-1063, score-0.45]

92 Discussion and Future Work In this paper, we presented relational dependency networks, a new form of probabilistic relational model. [sent-1066, score-0.711]

93 Autocorrelation is a nearly ubiquitous phenomenon in relational data sets and the resulting dependencies are often cyclic in nature. [sent-1072, score-0.487]

94 The real and synthetic data experiments in this paper show that collective inference with RDNs can offer significant improvement over conditional approaches when autocorrelation is present in the data. [sent-1074, score-0.486]

95 Furthermore, our experiments show that inference with RDNs is comparable, or superior, to RMN inference over a range of conditions, which indicates that pseudolikelihood estimation can be used effectively to learn an approximation of the full joint distribution. [sent-1076, score-0.398]

96 We also presented learned RDNs for a number of real-world relational domains, demonstrating another strength of RDNs—their understandable and intuitive knowledge representation. [sent-1077, score-0.407]

97 Understandable models also aid analysts’ assessment of the utility of the additional relational information, potentially reducing the cost of information gathering and storage and the need for data transfer among organizations—increasing the practicality and feasibility of relational modeling. [sent-1080, score-0.69]

98 Future work will compare RDNs to Markov logic networks in order to evaluate the performance tradeoffs for using pseudolikelihood approximations with different relational representations. [sent-1081, score-0.542]

99 In relational and network applications, the collective inference process introduces an additional source of error—both through the use of approximate inference algorithms and through variation in the availability of test set information. [sent-1085, score-0.539]

100 Linkage and autocorrelation cause feature selection bias in relational learning. [sent-1243, score-0.656]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rdn', 0.476), ('relational', 0.33), ('autocorrelation', 0.326), ('rdns', 0.268), ('rmn', 0.183), ('gd', 0.172), ('pseudolikelihood', 0.168), ('rdnrbc', 0.168), ('rdnrpt', 0.168), ('cpds', 0.13), ('dependencies', 0.123), ('attributes', 0.113), ('ensen', 0.099), ('eville', 0.099), ('prms', 0.099), ('ependency', 0.094), ('etworks', 0.094), ('rpt', 0.094), ('neville', 0.093), ('inference', 0.085), ('cpd', 0.084), ('elational', 0.084), ('filedon', 0.079), ('actedin', 0.074), ('rbc', 0.074), ('gibbs', 0.073), ('authoredby', 0.069), ('rmns', 0.067), ('records', 0.066), ('dns', 0.064), ('item', 0.063), ('gm', 0.061), ('joint', 0.06), ('clique', 0.057), ('locatedat', 0.054), ('madeby', 0.054), ('graph', 0.054), ('attribute', 0.052), ('dependency', 0.051), ('objects', 0.051), ('cora', 0.05), ('ceiling', 0.05), ('nasd', 0.05), ('directed', 0.046), ('rbns', 0.046), ('erence', 0.045), ('rbcs', 0.045), ('rolled', 0.045), ('logic', 0.044), ('jensen', 0.042), ('learned', 0.042), ('markov', 0.041), ('items', 0.041), ('vi', 0.041), ('acyclicity', 0.04), ('ref', 0.04), ('rpts', 0.04), ('collective', 0.039), ('imdb', 0.038), ('prm', 0.038), ('rbn', 0.038), ('templates', 0.037), ('conditional', 0.036), ('auc', 0.036), ('broker', 0.035), ('pai', 0.035), ('rmnsel', 0.035), ('understandable', 0.035), ('taskar', 0.034), ('heckerman', 0.034), ('cyclic', 0.034), ('vd', 0.034), ('cites', 0.034), ('dn', 0.033), ('parents', 0.032), ('seed', 0.032), ('xv', 0.031), ('year', 0.031), ('webkb', 0.03), ('autocorrelated', 0.03), ('belongsto', 0.03), ('ceil', 0.03), ('interactswith', 0.03), ('models', 0.03), ('um', 0.029), ('undirected', 0.029), ('pl', 0.029), ('getoor', 0.029), ('gene', 0.028), ('movie', 0.027), ('topic', 0.027), ('papers', 0.027), ('chains', 0.025), ('domingos', 0.025), ('brokers', 0.025), ('gi', 0.025), ('aper', 0.025), ('arcs', 0.025), ('horp', 0.025), ('mlns', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 72 jmlr-2007-Relational Dependency Networks

Author: Jennifer Neville, David Jensen

Abstract: Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational data sets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation

2 0.25427321 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

Author: Sofus A. Macskassy, Foster Provost

Abstract: paper1 This is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection. Keywords: relational learning, network learning, collective inference, collective classification, networked data, probabilistic relational models, network analysis, network data

3 0.080000632 1 jmlr-2007-"Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks

Author: Gal Elidan, Iftach Nachman, Nir Friedman

Abstract: Bayesian networks in general, and continuous variable networks in particular, have become increasingly popular in recent years, largely due to advances in methods that facilitate automatic learning from data. Yet, despite these advances, the key task of learning the structure of such models remains a computationally intensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks in the presence of missing values or hidden variables, a scenario that is part of many real-life problems. In this work we present a general method for speeding structure search for continuous variable networks with common parametric distributions. We efficiently evaluate the approximate merit of candidate structure modifications and apply time consuming (exact) computations only to the most promising ones, thereby achieving significant improvement in the running time of the search algorithm. Our method also naturally and efficiently facilitates the addition of useful new hidden variables into the network structure, a task that is typically considered both conceptually difficult and computationally prohibitive. We demonstrate our method on synthetic and real-life data sets, both for learning structure on fully and partially observable data, and for introducing new hidden variables during structure search. Keywords: Bayesian networks, structure learning, continuous variables, hidden variables

4 0.057893518 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

Author: Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh

Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly. Keywords: conditional random fields, graphical models, sequence labeling

5 0.050655816 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

Author: Wei Pan, Xiaotong Shen

Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage

6 0.044493161 43 jmlr-2007-Integrating Naïve Bayes and FOIL

7 0.040013555 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

8 0.038558785 11 jmlr-2007-Anytime Learning of Decision Trees

9 0.037065282 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

10 0.034551613 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

11 0.033609219 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

12 0.032381259 31 jmlr-2007-Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm

13 0.031333286 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

14 0.030381359 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time

15 0.029466657 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning

16 0.029437702 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

17 0.028039483 56 jmlr-2007-Multi-Task Learning for Classification with Dirichlet Process Priors

18 0.025126498 70 jmlr-2007-Ranking the Best Instances

19 0.024587367 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs

20 0.02405557 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers     (Special Topic on Model Selection)


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.176), (1, 0.227), (2, 0.013), (3, 0.059), (4, 0.018), (5, 0.097), (6, 0.116), (7, -0.177), (8, 0.065), (9, -0.403), (10, 0.3), (11, 0.051), (12, 0.214), (13, -0.109), (14, -0.135), (15, -0.158), (16, 0.093), (17, 0.15), (18, -0.011), (19, -0.059), (20, -0.065), (21, -0.026), (22, 0.134), (23, 0.024), (24, -0.105), (25, -0.059), (26, -0.029), (27, 0.066), (28, -0.049), (29, 0.01), (30, 0.025), (31, -0.061), (32, -0.001), (33, -0.131), (34, 0.049), (35, -0.081), (36, -0.003), (37, 0.025), (38, 0.037), (39, -0.075), (40, -0.046), (41, 0.016), (42, 0.062), (43, -0.023), (44, 0.066), (45, 0.03), (46, -0.029), (47, 0.015), (48, 0.011), (49, -0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95579326 72 jmlr-2007-Relational Dependency Networks

Author: Jennifer Neville, David Jensen

Abstract: Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational data sets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation

2 0.85611647 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

Author: Sofus A. Macskassy, Foster Provost

Abstract: paper1 This is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection. Keywords: relational learning, network learning, collective inference, collective classification, networked data, probabilistic relational models, network analysis, network data

3 0.28542313 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

Author: Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh

Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly. Keywords: conditional random fields, graphical models, sequence labeling

4 0.23207533 1 jmlr-2007-"Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks

Author: Gal Elidan, Iftach Nachman, Nir Friedman

Abstract: Bayesian networks in general, and continuous variable networks in particular, have become increasingly popular in recent years, largely due to advances in methods that facilitate automatic learning from data. Yet, despite these advances, the key task of learning the structure of such models remains a computationally intensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks in the presence of missing values or hidden variables, a scenario that is part of many real-life problems. In this work we present a general method for speeding structure search for continuous variable networks with common parametric distributions. We efficiently evaluate the approximate merit of candidate structure modifications and apply time consuming (exact) computations only to the most promising ones, thereby achieving significant improvement in the running time of the search algorithm. Our method also naturally and efficiently facilitates the addition of useful new hidden variables into the network structure, a task that is typically considered both conceptually difficult and computationally prohibitive. We demonstrate our method on synthetic and real-life data sets, both for learning structure on fully and partially observable data, and for introducing new hidden variables during structure search. Keywords: Bayesian networks, structure learning, continuous variables, hidden variables

5 0.19024292 43 jmlr-2007-Integrating Naïve Bayes and FOIL

Author: Niels Landwehr, Kristian Kersting, Luc De Raedt

Abstract: A novel relational learning approach that tightly integrates the na¨ve Bayes learning scheme with ı the inductive logic programming rule-learner FOIL is presented. In contrast to previous combinations that have employed na¨ve Bayes only for post-processing the rule sets, the presented approach ı employs the na¨ve Bayes criterion to guide its search directly. The proposed technique is impleı mented in the N FOIL and T FOIL systems, which employ standard na¨ve Bayes and tree augmented ı na¨ve Bayes models respectively. We show that these integrated approaches to probabilistic model ı and rule learning outperform post-processing approaches. They also yield significantly more accurate models than simple rule learning and are competitive with more sophisticated ILP systems. Keywords: rule learning, na¨ve Bayes, statistical relational learning, inductive logic programming ı

6 0.1827623 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

7 0.17157143 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

8 0.16562822 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

9 0.14599623 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

10 0.14369255 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

11 0.14084624 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

12 0.13440183 11 jmlr-2007-Anytime Learning of Decision Trees

13 0.1334511 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

14 0.12956221 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

15 0.12808312 57 jmlr-2007-Multi-class Protein Classification Using Adaptive Codes

16 0.12436379 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

17 0.12360132 16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation

18 0.12319085 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

19 0.11985482 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs

20 0.1183733 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.011), (8, 0.021), (10, 0.018), (11, 0.013), (12, 0.024), (14, 0.017), (15, 0.033), (18, 0.085), (22, 0.012), (28, 0.036), (40, 0.025), (45, 0.044), (48, 0.03), (60, 0.028), (77, 0.01), (80, 0.03), (85, 0.05), (89, 0.359), (98, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70377684 72 jmlr-2007-Relational Dependency Networks

Author: Jennifer Neville, David Jensen

Abstract: Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational data sets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation

2 0.65555167 86 jmlr-2007-Truncating the Loop Series Expansion for Belief Propagation

Author: Vicenç Gómez, Joris M. Mooij, Hilbert J. Kappen

Abstract: Recently, Chertkov and Chernyak (2006b) derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the belief propagation (BP) solution. By adding correction terms to the BP free energy, one for each “generalized loop” in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model on square grids and regular random graphs, and on PROMEDAS, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. Keywords: belief propagation, loop calculus, approximate inference, partition function, Ising grid, random regular graphs, medical diagnosis

3 0.36479402 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

Author: Sofus A. Macskassy, Foster Provost

Abstract: paper1 This is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection. Keywords: relational learning, network learning, collective inference, collective classification, networked data, probabilistic relational models, network analysis, network data

4 0.35567167 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs

Author: Joris M. Mooij, Hilbert J. Kappen

Abstract: We propose a method to improve approximate inference methods by correcting for the influence of loops in the graphical model. The method is a generalization and alternative implementation of a recent idea from Montanari and Rizzo (2005). It is applicable to arbitrary factor graphs, provided that the size of the Markov blankets is not too large. It consists of two steps: (i) an approximate inference method, for example, belief propagation, is used to approximate cavity distributions for each variable (i.e., probability distributions on the Markov blanket of a variable for a modified graphical model in which the factors involving that variable have been removed); (ii) all cavity distributions are improved by a message-passing algorithm that cancels out approximation errors by imposing certain consistency constraints. This loop correction (LC) method usually gives significantly better results than the original, uncorrected, approximate inference algorithm that is used to estimate the effect of loops. Indeed, we often observe that the loop-corrected error is approximately the square of the error of the uncorrected approximate inference method. In this article, we compare different variants of the loop correction method with other approximate inference methods on a variety of graphical models, including “real world” networks, and conclude that the LC method generally obtains the most accurate results. Keywords: loop corrections, approximate inference, graphical models, factor graphs, belief propagation

5 0.29846036 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

Author: Wei Pan, Xiaotong Shen

Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage

6 0.29377651 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

7 0.29211915 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation

8 0.28942597 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

9 0.28879201 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

10 0.28637952 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time

11 0.28593716 39 jmlr-2007-Handling Missing Values when Applying Classification Models

12 0.28582451 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

13 0.28392595 77 jmlr-2007-Stagewise Lasso

14 0.28351691 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

15 0.28245264 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers     (Special Topic on Model Selection)

16 0.2823118 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

17 0.28140432 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

18 0.28127456 1 jmlr-2007-"Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks

19 0.27947184 68 jmlr-2007-Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters     (Special Topic on Model Selection)

20 0.27712947 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm