nips nips2003 nips2003-118 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller
Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Stanford University Abstract Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. [sent-4, score-0.393]
2 This paper focuses on predicting the existence and the type of links between entities in such domains. [sent-5, score-0.664]
3 to define a joint probabilistic model over the entire link graph — entity attributes and links. [sent-7, score-0.938]
4 We apply this method to two new relational datasets, one involving university webpages, and the other a social network. [sent-9, score-0.422]
5 We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. [sent-10, score-1.091]
6 For example, in a data set consisting of a set of hyperlinked university webpages, we might want to predict not just which page belongs to a professor and which to a student, but also which professor is which student’s advisor. [sent-14, score-0.477]
7 In some cases, the existence of a relationship will be predicted by the presence of a hyperlink between the pages, and we will have only to decide whether the link reflects an advisor-advisee relationship. [sent-15, score-0.583]
8 In other cases, we might have to infer the very existence of a link from indirect evidence, such as a large number of co-authored papers. [sent-16, score-0.488]
9 In a very different application, we might want to predict links representing participation of individuals in certain terrorist activities. [sent-17, score-0.393]
10 One possible approach to this task is to consider the presence and/or type of the link using only attributes of the potentially linked entities and of the link itself. [sent-18, score-1.254]
11 For example, in our university example, we might try to predict and classify the link using the words on the two webpages, and the anchor words on the link (if present). [sent-19, score-1.076]
12 However, it completely ignores a rich source of information that is unique to this task — the graph structure of the link graph. [sent-21, score-0.505]
13 For example, a strong predictor of an advisor-advisee link between a professor and a student is the fact that they jointly participate in several projects. [sent-22, score-0.681]
14 In general, the link graph typically reflects common patterns of interactions between the entities in the domain. [sent-23, score-0.696]
15 We use this framework to define a single probabilistic model over the entire link graph, including both object labels (when relevant) and links between objects. [sent-27, score-0.908]
16 The model parameters are trained discriminatively, to maximize the probability of the (object and) link labels given the known attributes (e. [sent-28, score-0.657]
17 The learned model is then applied, using probabilistic inference, to predict and classify links using any observed attributes and links. [sent-31, score-0.67]
18 2 Link Prediction A relational domain is described by a relational schema, which specifies a set of object types and attributes for them. [sent-32, score-0.799]
19 In our web example, we have a Webpage type, where each page has a binary-valued attribute for each word in the dictionary, denoting whether the page contains the word. [sent-33, score-0.497]
20 To address the link prediction problem, we need to make links first-class citizens in our model. [sent-37, score-0.794]
21 Following [5], we introduce into our schema object types that correspond to links between entities. [sent-38, score-0.405]
22 Each link object is associated with a tuple of entity objects (o 1 , . [sent-39, score-0.748]
23 For example, a Hyperlink link object would be associated with a pair of entities — the linking page, and the linked-to page, which are part of the link definition. [sent-43, score-1.025]
24 We note that link objects may also have other attributes; e. [sent-44, score-0.47]
25 As our goal is to predict link existence, we must consider links that exist and links that do not. [sent-47, score-1.146]
26 Each potential link is associated with a tuple of entity objects, but it may or may not actually exist. [sent-49, score-0.673]
27 We denote this event using a binary existence attribute Exists, which is true if the link between the associated entities exists and false otherwise. [sent-50, score-0.667]
28 In our example, our model may contain a potential link for each pair of webpages, and the value of the variable . [sent-51, score-0.424]
29 The link prediction task now reduces to the problem of predicting the existence attributes of these link objects. [sent-53, score-1.202]
30 An instantiation I specifies the set of entities of each entity type and the values of all attributes for all of the entities. [sent-54, score-0.664]
31 For example, an instantiation of the hypertext schema is a collection of webpages, specifying their labels, the words they contain, and which links between them exist. [sent-55, score-0.531]
32 In the link prediction task, we might observe all of the attributes for all of the objects, except for the existence attributes for the links. [sent-57, score-0.821]
33 3 Relational Markov Networks We begin with a brief review of the framework of undirected graphical models or Markov Networks [13], and their extension to relational domains presented in [14]. [sent-59, score-0.377]
34 A relational Markov network (RMN) [14] specifies the cliques and potentials between attributes of related entities at a template level, so a single model provides a coherent distribution for any collection of instances from the schema. [sent-66, score-0.903]
35 RMNs specify the cliques using the notion of a relational clique template, which specify tuples of variables in the instantiation using a relational query language. [sent-67, score-0.943]
36 ) For example, if we want to define cliques between the class labels of linked pages, we might define a clique template that applies to all pairs page1,page2 and link of types Webpage, Webpage and Hyperlink, respectively, such that link points from page1 to page2. [sent-69, score-1.311]
37 Given a particular instantiation I of the schema, the RMN M produces an unrolled Markov network over the attributes of entities in I, in the obvious way. [sent-73, score-0.474]
38 The cliques in the unrolled network are determined by the clique templates C. [sent-74, score-0.426]
39 4 Subgraph Templates in a Link Graph The structure of link graphs has been widely used to infer importance of documents in scientific publications [4] and hypertext (PageRank [12], Hubs and Authorities [8]). [sent-85, score-0.476]
40 In our experiments, we found that the combination of a relational language with a probabilistic graphical model provides a very flexible framework for modeling complex patterns common in relational graphs. [sent-88, score-0.736]
41 For example, if we have evidence indicating an advisor-advisee relationship, our probability that X is a faculty member increases, and thereby our belief that X participates in a teaching assistant link with some entity Z decreases. [sent-96, score-0.789]
42 We also found it useful to consider richer subgraph templates over the link graph. [sent-97, score-0.569]
43 Another useful type of subgraph template involves transitivity patterns, where the presence of an A-B link and of a B-C link increases (or decreases) the likelihood of an A-C link. [sent-102, score-1.145]
44 45 ber mit sta ave (c) Figure 1: (a) Relation prediction with entity labels given. [sent-126, score-0.485]
45 Their approach easily captures the dependence of link existence on attributes of entities. [sent-135, score-0.634]
46 For example, for the transitivity pattern, we might consider simply directing the correlation edges between link existence variables arbitrarily. [sent-137, score-0.563]
47 However, it is not clear how we would then parameterize a link existence variable for a link that is involve in multiple triangles. [sent-138, score-0.912]
48 Owned pages, which are owned by an entity but are not the main page for that entity, were manually assigned to that entity. [sent-143, score-0.557]
49 We established a set of candidate links between entities based on evidence of a relation between them. [sent-145, score-0.639]
50 One type of evidence for a relation is a hyperlink from an entity page or one of its owned pages to the page of another entity. [sent-146, score-1.05]
51 A second type of evidence is a virtual link: We assigned a number of aliases to each page using the page title, the anchor text of incoming links, and email addresses of the entity involved. [sent-147, score-0.842]
52 Mentioning an alias of a page on another page constitutes a virtual link. [sent-148, score-0.466]
53 The observed attributes for each page are the words on the page itself and the “metawords” on the page — the words in the title, section headings, anchors to the page from other pages. [sent-150, score-1.181]
54 For links, the observed attributes are the anchor text, text just before the link (hyperlink or virtual link), and the heading of the section in which the link appears. [sent-151, score-1.08]
55 We tried two settings for our experiments: with page categories observed (in the test data) and page categories unobserved. [sent-153, score-0.651]
56 Given the page labels, we can rule out many impossible relations; the resulting label breakdown among the candidate links is: none (38%), member (34%), part-of (4%), advisor (11%), teach (9%), TA (5%). [sent-157, score-0.846]
57 Link-Flat is our baseline model, predicting links one at a time using multinomial logistic regression. [sent-160, score-0.402]
58 The features used by this model are the labels of the two linked pages and the words on the links going from one page and its owned pages to the other page. [sent-164, score-0.819]
59 The relational models try to improve upon the baseline model by modeling the interactions between relations and predicting relations jointly. [sent-166, score-0.561]
60 The Section model introduces cliques over relations whose links appear consecutively in a section on a page. [sent-167, score-0.584]
61 Specifically, we introduce cliques over sets of three candidate links that form a triangle in the link graph. [sent-173, score-0.952]
62 As an example of the interesting inferences made by the models, we found a studentprofessor pair that was misclassified by the Flat model as none (there is only a single hyperlink from the student’s page to the advisor’s) but correctly identified by both the Section and Triad models. [sent-180, score-0.388]
63 The Section model utilizes a paragraph on the student’s webpage describing his research, with a section of links to his research groups and the link to his advisor. [sent-181, score-0.836]
64 When the labels of pages are not known during relations prediction, we cannot rule out possible relations for candidate links based on the labels of participating entities. [sent-188, score-0.696]
65 Thus, we have many more candidate links that do not correspond to any of our relation types (e. [sent-189, score-0.461]
66 This makes the existence of relations a very low probability event, with the following breakdown among the potential relations: none (71%), member (16%), part-of (2%), advisor (5%), teach (4%), TA (2%). [sent-192, score-0.374]
67 In addition, when we construct a Markov network in which page labels are not observed, the network is much larger and denser, making the (approximate) inference task much harder. [sent-193, score-0.448]
68 Thus, in addition to models that try to predict page entity and relation labels simultaneously, we also tried a two-phase approach, where we first predict page categories, and then use the predicted labels as features for the model that predicts relations. [sent-194, score-1.195]
69 The Neighbors model is a relational model that exploits another type of similarity template: pages 0. [sent-198, score-0.391]
70 Given the page categories, we can now apply the different models for link classification. [sent-221, score-0.687]
71 Thus, the Phased (Flat/Flat) model uses the Entity-Flat model to classify the page labels, and then the Link-Flat model to classify the candidate links using the resulting entity labels. [sent-222, score-0.964]
72 The Phased (Neighbors/Section) model uses the Neighbors to classify the entity labels and then the Section model to classify the links. [sent-224, score-0.442]
73 We also tried two models that predict page and relation labels simultaneously. [sent-225, score-0.531]
74 The Joint + Neighbors model is simply the union of the Neighbors model for page categories and the Flat model for relation labels given the page categories. [sent-226, score-0.695]
75 The Joint + Neighbors + Section model additionally introduces the cliques that appeared in the Section model between links that appear consecutively in a section on a page. [sent-227, score-0.511]
76 We train the joint models to predict both page and relation labels simultaneously. [sent-228, score-0.528]
77 We focused on the task of predicting the “friendship” links between students from their personal information and a subset of their links. [sent-239, score-0.603]
78 We selected students living in sixteen different residences or dorms and restricted the data to the friendship links only within each residence, eliminating inter-residence links from the data to generate independent training/test splits. [sent-240, score-0.877]
79 Each residence has about 15–25 students and an average student lists about 25% of his or her house-mates as friends. [sent-241, score-0.384]
80 Predicting links between two students from just personal information alone is a very difficult task, so we tried a more realistic setting, where some proportion of the links is observed in the test data, and can be used as evidence for predicting the remaining links. [sent-243, score-1.038]
81 We used the following proportions of observed links in the test data: 10%, 25%, and 50%. [sent-244, score-0.368]
82 Using just the observed portion of links, we constructed the following flat features: for each student, the proportion of students in the residence that list him/her and the proportion of students he/she lists; for each pair of students, the proportion of other students they have as common friends. [sent-246, score-0.591]
83 These features capture some of the relational structure and dependencies between links: Students who list (or are listed by) many friends in the observed portion of the links tend to have links in the unobserved portion as well. [sent-248, score-1.115]
84 More importantly, having friends in common increases the likelihood of a link between a pair of students. [sent-249, score-0.499]
85 The Compatibility model uses a type of similarity template, introducing cliques between each pair of links emanating from each person. [sent-252, score-0.56]
86 7 Discussion and Conclusions In this paper, we consider the problem of link prediction in relational domains. [sent-265, score-0.777]
87 We focus on the task of collective link classification, where we are simultaneously trying to predict and classify an entire set of links in a link graph. [sent-266, score-1.354]
88 We show that the use of a probabilistic model over link graphs allows us to represent and exploit interesting subgraph patterns in the link graph. [sent-267, score-1.036]
89 Similarity templates relate the classification of links or objects that share a certain graph-based property (e. [sent-269, score-0.444]
90 Transitivity templates relate triples of objects and links organized in a triangle. [sent-272, score-0.444]
91 Relational Markov networks are not the only method one might consider applying to the link prediction and classification task. [sent-274, score-0.465]
92 We could, for example, build a link predictor that considers other links in the graph by converting graph features into flat features [11], as we did in the social network data. [sent-275, score-1.076]
93 Generally, however, these methods have been applied to the problem of predicting or classifying a single link at a time. [sent-278, score-0.497]
94 It is not clear how well they would extend to the task of simultaneously predicting an entire link graph. [sent-279, score-0.527]
95 Furthermore, as we discussed, the PRM framework cannot represent (in any natural way) the type of subgraph patterns that seem prevalent in link graph data. [sent-282, score-0.674]
96 This problem is especially acute in the link prediction problem, where the presence of all potential links leads to densely connected Markov networks with many short loops. [sent-286, score-0.794]
97 This problem can be addressed with heuristics that focus the search on links that are plausible (as we did in a very simple way in the webpage experiments). [sent-287, score-0.412]
98 Our results use a set of relational patterns that we have discovered to be useful in the domains that we have considered. [sent-289, score-0.42]
99 Finally, we believe that the problem of modeling link graphs has numerous other applications, including: analyzing communities of people and hierarchical structure of organizations, identifying people or objects that play certain key roles, predicting current and future interactions, and more. [sent-293, score-0.615]
100 Probabilistic models of text and link structure for hypertext classification. [sent-344, score-0.506]
wordName wordTfidf (topN-words)
[('link', 0.424), ('links', 0.329), ('relational', 0.312), ('entity', 0.249), ('page', 0.233), ('student', 0.167), ('cliques', 0.152), ('entities', 0.148), ('attributes', 0.146), ('flat', 0.143), ('students', 0.129), ('phased', 0.12), ('triad', 0.12), ('social', 0.11), ('rmn', 0.104), ('taskar', 0.104), ('clique', 0.096), ('template', 0.096), ('hyperlink', 0.095), ('professor', 0.09), ('labels', 0.087), ('relation', 0.085), ('webpage', 0.083), ('webpages', 0.078), ('schools', 0.078), ('subgraph', 0.076), ('breakeven', 0.075), ('friends', 0.075), ('owned', 0.075), ('transitivity', 0.075), ('predicting', 0.073), ('relations', 0.073), ('patterns', 0.073), ('instantiation', 0.071), ('templates', 0.069), ('existence', 0.064), ('predict', 0.064), ('advisor', 0.06), ('getoor', 0.06), ('neigh', 0.06), ('residence', 0.06), ('residences', 0.06), ('unrolled', 0.06), ('none', 0.06), ('categories', 0.057), ('classify', 0.053), ('compatibility', 0.052), ('abbeel', 0.052), ('hypertext', 0.052), ('graph', 0.051), ('type', 0.05), ('vc', 0.049), ('network', 0.049), ('anchor', 0.047), ('schema', 0.047), ('candidate', 0.047), ('objects', 0.046), ('neighbors', 0.045), ('teach', 0.045), ('faculty', 0.044), ('personal', 0.042), ('member', 0.042), ('markov', 0.041), ('prediction', 0.041), ('observed', 0.039), ('scientist', 0.039), ('rmns', 0.039), ('sta', 0.039), ('probabilistic', 0.039), ('koller', 0.038), ('people', 0.036), ('ber', 0.036), ('ta', 0.036), ('classi', 0.035), ('domains', 0.035), ('proportion', 0.035), ('ave', 0.033), ('words', 0.032), ('linked', 0.032), ('tried', 0.032), ('attribute', 0.031), ('features', 0.031), ('models', 0.03), ('breakdown', 0.03), ('consecutively', 0.03), ('della', 0.03), ('friendship', 0.03), ('pietra', 0.03), ('prm', 0.03), ('prms', 0.03), ('staff', 0.03), ('taught', 0.03), ('urls', 0.03), ('evidence', 0.03), ('collective', 0.03), ('task', 0.03), ('joint', 0.029), ('object', 0.029), ('similarity', 0.029), ('lists', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 118 nips-2003-Link Prediction in Relational Data
Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller
Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1
2 0.1111274 124 nips-2003-Max-Margin Markov Networks
Author: Ben Taskar, Carlos Guestrin, Daphne Koller
Abstract: In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3 ) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
3 0.11051514 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
4 0.082713515 99 nips-2003-Kernels for Structured Natural Language Data
Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1
5 0.071431197 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
6 0.069793433 108 nips-2003-Learning a Distance Metric from Relative Comparisons
7 0.054401662 121 nips-2003-Log-Linear Models for Label Ranking
8 0.048465125 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
9 0.048327453 30 nips-2003-Approximability of Probability Distributions
10 0.047932476 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
11 0.047097143 164 nips-2003-Ranking on Data Manifolds
12 0.046987724 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
13 0.044828035 189 nips-2003-Tree-structured Approximations by Expectation Propagation
14 0.044226136 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
15 0.043157466 152 nips-2003-Pairwise Clustering and Graphical Models
16 0.042686876 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
17 0.04186032 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
18 0.041422576 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
19 0.040967245 113 nips-2003-Learning with Local and Global Consistency
20 0.040034335 73 nips-2003-Feature Selection in Clustering Problems
topicId topicWeight
[(0, -0.14), (1, -0.002), (2, -0.019), (3, 0.01), (4, -0.039), (5, -0.082), (6, -0.069), (7, -0.01), (8, 0.055), (9, 0.018), (10, 0.092), (11, -0.019), (12, -0.006), (13, -0.01), (14, 0.089), (15, 0.019), (16, -0.088), (17, -0.061), (18, -0.021), (19, 0.005), (20, -0.015), (21, 0.027), (22, -0.038), (23, 0.033), (24, -0.123), (25, -0.126), (26, 0.009), (27, -0.108), (28, 0.005), (29, -0.051), (30, 0.08), (31, -0.064), (32, -0.018), (33, 0.094), (34, 0.019), (35, -0.066), (36, 0.083), (37, 0.056), (38, -0.167), (39, -0.02), (40, 0.063), (41, -0.027), (42, 0.042), (43, -0.12), (44, 0.057), (45, 0.279), (46, 0.193), (47, -0.203), (48, 0.055), (49, 0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.96374571 118 nips-2003-Link Prediction in Relational Data
Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller
Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1
2 0.49180284 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
3 0.46723938 99 nips-2003-Kernels for Structured Natural Language Data
Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1
4 0.43922326 124 nips-2003-Max-Margin Markov Networks
Author: Ben Taskar, Carlos Guestrin, Daphne Koller
Abstract: In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3 ) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
5 0.37229061 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
6 0.35698563 30 nips-2003-Approximability of Probability Distributions
7 0.3278023 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
8 0.32284236 108 nips-2003-Learning a Distance Metric from Relative Comparisons
9 0.32159895 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
10 0.30949974 181 nips-2003-Statistical Debugging of Sampled Programs
11 0.26895538 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
12 0.26794553 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
13 0.26759353 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
14 0.26564038 71 nips-2003-Fast Embedding of Sparse Similarity Graphs
15 0.26059178 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity
16 0.25386775 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
17 0.2535862 45 nips-2003-Circuit Optimization Predicts Dynamic Networks for Chemosensory Orientation in Nematode C. elegans
18 0.25163025 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
19 0.24818631 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
20 0.24657224 121 nips-2003-Log-Linear Models for Label Ranking
topicId topicWeight
[(0, 0.045), (11, 0.017), (16, 0.38), (30, 0.011), (35, 0.036), (53, 0.062), (59, 0.01), (71, 0.086), (76, 0.05), (85, 0.106), (91, 0.092), (99, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.82363874 118 nips-2003-Link Prediction in Relational Data
Author: Ben Taskar, Ming-fai Wong, Pieter Abbeel, Daphne Koller
Abstract: Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph — entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation. 1
2 0.68595886 164 nips-2003-Ranking on Data Manifolds
Author: Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, Bernhard Schölkopf
Abstract: The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method. 1
3 0.44320935 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin
Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as on-line learning. 1
4 0.44130349 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman
Abstract: Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification. 1
5 0.43408412 124 nips-2003-Max-Margin Markov Networks
Author: Ben Taskar, Carlos Guestrin, Daphne Koller
Abstract: In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3 ) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
6 0.43390965 3 nips-2003-AUC Optimization vs. Error Rate Minimization
7 0.42904809 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
8 0.42772749 41 nips-2003-Boosting versus Covering
9 0.4261905 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
10 0.42400315 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
11 0.4224638 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
12 0.42180094 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
13 0.41946623 189 nips-2003-Tree-structured Approximations by Expectation Propagation
14 0.41919833 145 nips-2003-Online Classification on a Budget
15 0.41833115 117 nips-2003-Linear Response for Approximate Inference
16 0.41824719 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
17 0.41820642 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
18 0.41806069 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
19 0.41668928 78 nips-2003-Gaussian Processes in Reinforcement Learning
20 0.41649121 100 nips-2003-Laplace Propagation