nips nips2001 nips2001-24 knowledge-graph by maker-knowledge-mining

24 nips-2001-Active Information Retrieval


Source: pdf

Author: Tommi Jaakkola, Hava T. Siegelmann

Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. [sent-4, score-1.225]

2 The users may further refine their query as needed. [sent-5, score-0.61]

3 We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. [sent-7, score-1.155]

4 The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. [sent-8, score-0.695]

5 The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. [sent-10, score-0.607]

6 1 Introduction An IR system consists of a collection of documents and an engine that retrieves documents described by users queries. [sent-11, score-0.381]

7 In large systems, such as the Web, queries are typically too vague, and hence, an iterative process in which the users refine their queries gradually has to take place. [sent-12, score-0.216]

8 Since much dissatisfaction of IR users stems from long, tedious repetitive search sessions, our research is targeted at shortening the search session. [sent-13, score-0.186]

9 We propose a new search paradigm of active information retrieval in which the user initiates only one query, and the subsequent iterative process is led by the engine/system. [sent-14, score-0.72]

10 Our approach is related but not identical to the interactive search processes called relevance feedback. [sent-16, score-0.238]

11 In relevance feedback, the system has to deduce a set of "features" (words, phrases, etc. [sent-18, score-0.205]

12 ) that characterize the set of selected relevant documents, and use these features in formulating a new query (e. [sent-19, score-0.562]

13 In contrast, we cast the problem as a problem of estimation and the goal is to recover the unknown docum ent weights or relevance assessments. [sent-22, score-0.226]

14 Our system also relates to the Scatter/Gather algorithm of browsing information systems [2], where the system initially scatters the document collection into a fixed number k of clusters whose summaries are presented to the user. [sent-23, score-0.415]

15 The user select clusters from a new sub-collection, to be scattered again into k clusters, and so forth , until enumerating single documents. [sent-24, score-0.735]

16 In our approach, documents are not discarded but rather their weighting is updated appropriately. [sent-25, score-0.173]

17 Overlapping clusters were recently proposed to better match real-life grouping and allow natural summarizing and viewing [4]. [sent-27, score-0.231]

18 This short paper focuses on the underlying methodology of the active learning approach. [sent-28, score-0.115]

19 2 Active search Let X be the set of documents (elements) in the database and C = {GI , . [sent-29, score-0.274]

20 , Gm } a set of available clusters of documents for which appropriate summaries can be generated. [sent-32, score-0.427]

21 The set of clusters typically includes individual documents and may come from a fiat, hierarchical, or overlapping clustering method. [sent-33, score-0.469]

22 The clustering need not be static, however, and could be easily defined dynamically in the search process. [sent-34, score-0.1]

23 Given the set of available clusters, we may choose a query set, a limited set of clusters that are presented to the user for selection at each iteration of the search process. [sent-35, score-1.349]

24 The user is expected to choose the best matching cluster in this set or, alternatively, annotate the clusters with relevant/irrelevant labels (select the relevant ones). [sent-36, score-0.998]

25 The following sections address three primary issues: the user model, how to incorporate the information from user selections, and how to optimize the query set presented to the user. [sent-39, score-1.466]

26 All the algorithms should scale linearly with the database size (and the size of the query set). [sent-40, score-0.582]

27 3 Contrastive selection model We start with a contrastive selection model where the user is expected to choose only the best matching cluster in the query set. [sent-41, score-1.415]

28 In case of multiple selections, we will interpret the marked clusters as a redefined cluster of the query set. [sent-42, score-0.944]

29 While this interpretation will result in sub-optimal choices for the query set assuming the user consistently selects multiple clusters, the interpretation nevertheless obviates the need for modeling user's selection biases in this regard. [sent-43, score-1.125]

30 An empty selection, on the other hand, suggests that the clusters outside the query set are deemed more likely. [sent-44, score-0.807]

31 To capture the ranking implied by the user selections, we define weights (distribution) {Bx}, L: XEX Bx = lover the underlying documents. [sent-46, score-0.628]

32 We assume that the user behavior is (probabilistic ally) consistent with one such weighting B;. [sent-47, score-0.501]

33 The goal of a retrieval algorithm is therefore to recover this underlying weighting through interactions with the user . [sent-48, score-0.672]

34 The resulting (approximation to) B can be used to correctly ; rank the documents or, for example, to display all the documents with sufficiently large weight (cf. [sent-49, score-0.292]

35 Naturally, B; changes from one retrieval task to another and has to be inferred separately in each task. [sent-51, score-0.106]

36 We might estimate a user specific prior (model) over the document weights to reflect consistent biases that different users have across retrieval tasks. [sent-52, score-0.763]

37 We express our prior belief about the document weights in terms of a Dirichlet distribution: P(B) = liZ¡ rrxExB~ ' -l, where Z = [f1 x Exr(ax)l/r(L:~= l ax). [sent-53, score-0.098]

38 1 Inference Suppose a fiat Dirichlet distribution P(B) over the document weights and a fixed query set S = {CS1, . [sent-55, score-0.645]

39 We evaluate here the posterior distribution P(Bly) given the user response y. [sent-59, score-0.546]

40 The key is to transform P(B) into a hierarchical form so as to explicate the portion of the distribution potentially affected by the user response. [sent-60, score-0.531]

41 The hierarchy, illustrated in Figure 1a), contains three levels: selection of S or X \ S; choices within the query set S (of most interest to us) and those under X \ S; selections within the clusters C S1 in S. [sent-61, score-0.96]

42 For simplicity, the clusters are assumed to be either nested or disjoint , i. [sent-62, score-0.286]

43 , k for the cluster choices within the query set whereas B~~~, x ~ S gives the document choices outside S. [sent-69, score-0.878]

44 Finally, B~~j for x E C Sj indicate the parameters associated with the cluster CSj E S. [sent-70, score-0.222]

45 The original flat Dirichlet P(B) can be written as a product p(B(l) )P(BW )P(BW) [rr~=l P( B(I~)) ] with the appropriate normalization constraints. [sent-71, score-0.141]

46 If clusters in S overlap, the expansion is carried out in terms of the disjoint subsets. [sent-72, score-0.255]

47 The parameters governing the Dirichlet component distributions are readily obtained by gathering the appropriate parameters ax of the original Dirichlet (cf. [sent-73, score-0.152]

48 If user selects cluster CSy , we will update P( 8W) which reduces to adjusting the counts a~~i f- a~~i + 1. [sent-82, score-0.749]

49 The resulting new parameters give rise to the posterior dis- tribution P(8W IY) and, by including the other components, to the overall posterior P(8IY). [sent-83, score-0.171]

50 If the user selects "none of these items," only the first level parameters 8~1) will be updated. [sent-84, score-0.54]

51 2 Query set optimization Our optimization criterion for choosing the query set S is the information that we stand to gain from querying the user with it. [sent-86, score-1.056]

52 aW in terms of the original (flat) counts ax, To simplify, we expand the counts and define for all clusters (whether or not they appear in the query set) the weights ai = I: xECi ax, bi = aiw(ai + 1) - ailogai. [sent-89, score-0.883]

53 The mutual information criterion now depends only on as = I:~=l a Si = I: xES ax, the overall weight of the query set and bs = I:~= l bS i which provides an overall measure of how informative the individual clusters in S are. [sent-90, score-0.873]

54 ll) = - + log(as) - w(as + 1) as (3) We can optimize the choice of S with a simple greedy method that successively finds the next best cluster index i to include in the information set. [sent-92, score-0.223]

55 This algorithm scales as O(km), where m is the number of clusters in our database and k is the the maximal query set size in terms of the number of clusters. [sent-93, score-0.813]

56 Note that this simple criterion excludes nested or overlapping clusters from S. [sent-94, score-0.348]

57 In a more general context, the bookkeeping problem associated with the overlapping clusters is analogous to that of the Kikuchi expansion in statistical physics (cf. [sent-95, score-0.287]

58 3 Projection back to a flat Dirichlet The hierarchical posterior is not a flat Dirichlet anymore. [sent-98, score-0.44]

59 To maintain simplicity, we project it back into a flat Dirichlet in the KL-divergence sense: P~ I Y = argminQo KL(Pe 1 yIIQe), where P(8Iy) is the hierarchical posterior expressed in terms of the original flat variables 8x ,x E X (but no longer a flat Dirichlet). [sent-99, score-0.609]

60 The transformation from hierarchical to flat variables is given by: 8x = 8~1) 8JN 8~~j for x E CSj , j = 1, . [sent-100, score-0.198]

61 , k we get (derivation omitted) (4) where y denotes the user selection. [sent-107, score-0.474]

62 For x E X\S, EO ly log O = w(a x ) - W(L zEX a z ) x If we define rx = Ee ly log Ox for all x E X, then the counts f3x corresponding to the flat approximation Qo can be found by minimizing (5) xEX xEX where we have omitted any terms not depending on f3x . [sent-108, score-0.364]

63 It is therefore no longer obvious that the expected entropy of the projected posterior possesses any analogous guarantees; indeed, projections of this type typically increase the entropy. [sent-114, score-0.19]

64 > 0, Ey {H(Qo IY) } :::; H(Pe) - f(k -l)/as + 0(102 ), where k is the size of the query set and as = L zEs a z . [sent-117, score-0.518]

65 Theorem 1 For any 10 While this result is not tight , it does demonstrate that the projection back into a flat Dirichlet still permits a semi-linear decrease in the entropy2. [sent-118, score-0.213]

66 4 Annotation model The contrastive selection approach discussed above operates a priori in a single topic mode 3 . [sent-125, score-0.112]

67 The expectation that the user should select the best matching cluster in the query set also makes an inefficient use of the query set. [sent-126, score-1.762]

68 We provide here an analogous development of the active learning approach under the assumption that the user classifies rather than contrasts the clusters. [sent-127, score-0.55]

69 While the parameters qo and q could easily be inferred from past searches, we assume here for simplicity that they are known to the search algorithm. [sent-129, score-0.2]

70 The user annotations of different clusters in the query set are independent of each other, even for overlapping clusters. [sent-130, score-1.364]

71 To ensure that we can infer the unknown relevance assignments from the observabIes (cluster annotations), we require identifiability: the annotation probabilities P(Ye = 1Ir*), for all c E C, should uniquely determine {r;}. [sent-131, score-0.294]

72 Equivalently, knowing the number of relevant documents in each cluster should enable us to recover the underlying relevance assignments. [sent-132, score-0.624]

73 This is a property of the cluster structure and holds trivially for any clustering with access to individual elements. [sent-133, score-0.231]

74 The search algorithm maintains a simple independent Bernoulli model over the unknown relevance assignments: P(rIB) = TIxEx B;' (1 - Bx) l - r • . [sent-134, score-0.238]

75 This gives rise to a marginal noisy-OR model over cluster annotations: P(Ye = liB) = L P(Ye = 1Ir)P(rIB) = 1 - (1 - qo) r II (1- Bxq) (8) x Ee The uncertainty about the relevance assignments {rx} makes the system beliefs about the cluster annotations dependent on each other. [sent-135, score-0.736]

76 The parameters (relevance probabilities) {Bx} are, of course, specific to each search task. [sent-136, score-0.118]

77 1 Inference and projection Given fie E {O, 1} for a single cluster c, we can evaluate the posterior P(rlfie, B) over the relevance assignments. [sent-138, score-0.484]

78 Similarly to noisy-OR graphical models, this posterior can be (exponentially) costly to maintain and we instead sequentially project the posterior back into the set of independent Bernoulli distributions. [sent-139, score-0.225]

79 The m-projection preserves the posterior expectations B~ ; vc = Er lYc{rx} used for ranking the documents. [sent-141, score-0.133]

80 The projection yields simple element-wise updates for the parameters 4 : for x E c, (9) where Po = P(yc = OIB) = (l-qo) IT xEc(l-Bxq) is the only parameter that depends on the cluster as a whole. [sent-143, score-0.265]

81 We use a lower bound: I(yc; r iB ) ::::: EYe{ l: D(Bx ;Ye II Bx) } d~ Ip(yc; riB) (10) xE c where BX;Ye' x E X are the parameters of the m-projected posterior and KL(Bx;yJB x ) is the KL-divergence between two Bernoulli distributions with mean parameters BX;Ye and Bx , respectively. [sent-146, score-0.126]

82 In other words , the search terminates only if we are already fully certain about the underlying relevance assignments. [sent-148, score-0.277]

83 The best k clusters to query are those maximizing Finding the optimal query set under this criterion (even with the m-projections) involves O(nk2k) operations. [sent-149, score-1.297]

84 We select the clusters sequentially while maintaining an explicit dependence on the hypothetical outcome (classification) of only the previous cluster choice. [sent-150, score-0.508]

85 More precisely, we combine the cluster selection with conditional projections: for k > 1, Ck = argmaxclp(Yc,Yck;rIBk - l), B~. [sent-151, score-0.257]

86 The mutual information terms do not, however, decompose k additively with the elements in the clusters. [sent-154, score-0.094]

87 The desired O(kn) scaling of the selection algorithm requires a cached spline reconstruction 5 . [sent-155, score-0.126]

88 3 Sanity check results Figure 1b) gives the mean number of iterations of the query process as function of the database size. [sent-157, score-0.582]

89 5The mutual information terms for select fixed values of po can be cached additively relative to the cluster structure. [sent-171, score-0.439]

90 The user responses were selected on the basis of the same parameters and a randomly chosen (single) underlying element of interest. [sent-176, score-0.54]

91 The search is terminated when the sought after element in the database has the highest rank according to {Ox} , x E X. [sent-177, score-0.159]

92 The randomized cluster structures were relatively balanced and hierarchical. [sent-178, score-0.195]

93 Results for random choice of the clusters in the query are far outside the figure. [sent-180, score-0.778]

94 Figure lc), on the other hand, demonstrates that increasing the query set size appropriately reduces the interaction time. [sent-181, score-0.518]

95 Note that since all the clusters in the query set have to be chosen prior to getting feedback from any of the clusters, doubling the query set size cannot theoretically reduce the retrieval time to a half. [sent-182, score-1.486]

96 5 Discussion The active learning approach proposed here provides the basic methodology for optimally querying the user at multiple levels of abstraction. [sent-183, score-0.584]

97 For example, we can encourage the user to provide confidence rated selections/annotations among the presented clusters. [sent-185, score-0.474]

98 Both user models can be adapted to handle such selections. [sent-186, score-0.474]

99 Analyzing the fundamental trade-offs between the size of the query set (resource constraints) and the expected completion time of the retrieval process will also be addressed in later work. [sent-187, score-0.651]

100 Tukey, Scatter/Gather: A cluster Based Approach to Browse Document Collections, In Proceedings of the Fifteenth Annual International ACM SIGIR Conference, Denmark, June 1996. [sent-201, score-0.195]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('query', 0.518), ('user', 0.474), ('clusters', 0.231), ('cluster', 0.195), ('relevance', 0.174), ('bx', 0.154), ('dirichlet', 0.15), ('documents', 0.146), ('flat', 0.141), ('ye', 0.125), ('selections', 0.117), ('qo', 0.109), ('retrieval', 0.106), ('ax', 0.098), ('annotations', 0.085), ('xex', 0.085), ('yc', 0.085), ('active', 0.076), ('posterior', 0.072), ('document', 0.072), ('database', 0.064), ('search', 0.064), ('annotation', 0.064), ('cached', 0.064), ('rib', 0.064), ('selection', 0.062), ('queries', 0.062), ('users', 0.058), ('hierarchical', 0.057), ('mutual', 0.057), ('entropy', 0.056), ('overlapping', 0.056), ('po', 0.056), ('assignments', 0.056), ('bernoulli', 0.056), ('contrastive', 0.05), ('summaries', 0.05), ('ey', 0.047), ('rx', 0.047), ('feedback', 0.046), ('pe', 0.044), ('relevant', 0.044), ('projection', 0.043), ('csj', 0.042), ('doubling', 0.042), ('hava', 0.042), ('initiated', 0.042), ('ox', 0.042), ('counts', 0.041), ('ly', 0.04), ('sj', 0.04), ('kl', 0.039), ('selects', 0.039), ('underlying', 0.039), ('bs', 0.037), ('ranking', 0.037), ('additively', 0.037), ('xes', 0.037), ('clustering', 0.036), ('projections', 0.035), ('bw', 0.034), ('querying', 0.034), ('refine', 0.034), ('siegelmann', 0.034), ('choices', 0.032), ('system', 0.031), ('nested', 0.031), ('terminated', 0.031), ('criterion', 0.03), ('select', 0.03), ('ip', 0.03), ('omitted', 0.029), ('iy', 0.029), ('fiat', 0.029), ('deemed', 0.029), ('resource', 0.029), ('outside', 0.029), ('back', 0.029), ('responds', 0.028), ('successively', 0.028), ('outcome', 0.028), ('maintain', 0.028), ('parameters', 0.027), ('specific', 0.027), ('expected', 0.027), ('pr', 0.027), ('weighting', 0.027), ('matching', 0.027), ('tommi', 0.027), ('recover', 0.026), ('weights', 0.026), ('ek', 0.026), ('qr', 0.026), ('ranked', 0.026), ('implied', 0.026), ('define', 0.026), ('theoretically', 0.025), ('vc', 0.024), ('sequentially', 0.024), ('disjoint', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 24 nips-2001-Active Information Retrieval

Author: Tommi Jaakkola, Hava T. Siegelmann

Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1

2 0.21734412 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

Author: Matthew Richardson, Pedro Domingos

Abstract: The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve PageRank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human-rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines. 1

3 0.18734559 140 nips-2001-Optimising Synchronisation Times for Mobile Devices

Author: Neil D. Lawrence, Antony I. T. Rowstron, Christopher M. Bishop, Michael J. Taylor

Abstract: With the increasing number of users of mobile computing devices (e.g. personal digital assistants) and the advent of third generation mobile phones, wireless communications are becoming increasingly important. Many applications rely on the device maintaining a replica of a data-structure which is stored on a server, for example news databases, calendars and e-mail. ill this paper we explore the question of the optimal strategy for synchronising such replicas. We utilise probabilistic models to represent how the data-structures evolve and to model user behaviour. We then formulate objective functions which can be minimised with respect to the synchronisation timings. We demonstrate, using two real world data-sets, that a user can obtain more up-to-date information using our approach. 1

4 0.18506859 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

5 0.17346162 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

Author: Carlotta Domeniconi, Dimitrios Gunopulos

Abstract: The nearest neighbor technique is a simple and appealing method to address classification problems. It relies on t he assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. We propose a technique that computes a locally flexible metric by means of Support Vector Machines (SVMs). The maximum margin boundary found by the SVM is used to determine the most discriminant direction over the query's neighborhood. Such direction provides a local weighting scheme for input features. We present experimental evidence of classification performance improvement over the SVM algorithm alone and over a variety of adaptive learning schemes, by using both simulated and real data sets. 1

6 0.162614 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

7 0.16071217 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

8 0.15422989 51 nips-2001-Cobot: A Social Reinforcement Learning Agent

9 0.12759559 171 nips-2001-Spectral Relaxation for K-means Clustering

10 0.1229295 107 nips-2001-Latent Dirichlet Allocation

11 0.11070713 78 nips-2001-Fragment Completion in Humans and Machines

12 0.075474486 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

13 0.069643639 61 nips-2001-Distribution of Mutual Information

14 0.06758105 58 nips-2001-Covariance Kernels from Bayesian Generative Models

15 0.065669976 185 nips-2001-The Method of Quantum Clustering

16 0.062426582 30 nips-2001-Agglomerative Multivariate Information Bottleneck

17 0.059473373 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation

18 0.057267427 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

19 0.057080511 53 nips-2001-Constructing Distributed Representations Using Additive Clustering

20 0.055304818 147 nips-2001-Pranking with Ranking


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.173), (1, 0.014), (2, 0.002), (3, -0.18), (4, 0.034), (5, -0.143), (6, -0.099), (7, -0.008), (8, -0.127), (9, -0.01), (10, 0.354), (11, -0.131), (12, -0.258), (13, 0.047), (14, 0.052), (15, 0.152), (16, 0.201), (17, 0.119), (18, -0.17), (19, 0.001), (20, 0.197), (21, 0.023), (22, -0.02), (23, -0.05), (24, -0.2), (25, 0.074), (26, -0.061), (27, -0.169), (28, -0.072), (29, 0.036), (30, -0.011), (31, 0.03), (32, 0.04), (33, -0.007), (34, -0.021), (35, -0.005), (36, 0.056), (37, -0.02), (38, -0.038), (39, -0.001), (40, 0.05), (41, 0.019), (42, 0.019), (43, -0.089), (44, -0.046), (45, -0.021), (46, 0.059), (47, 0.03), (48, -0.047), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97848612 24 nips-2001-Active Information Retrieval

Author: Tommi Jaakkola, Hava T. Siegelmann

Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1

2 0.66218406 140 nips-2001-Optimising Synchronisation Times for Mobile Devices

Author: Neil D. Lawrence, Antony I. T. Rowstron, Christopher M. Bishop, Michael J. Taylor

Abstract: With the increasing number of users of mobile computing devices (e.g. personal digital assistants) and the advent of third generation mobile phones, wireless communications are becoming increasingly important. Many applications rely on the device maintaining a replica of a data-structure which is stored on a server, for example news databases, calendars and e-mail. ill this paper we explore the question of the optimal strategy for synchronising such replicas. We utilise probabilistic models to represent how the data-structures evolve and to model user behaviour. We then formulate objective functions which can be minimised with respect to the synchronisation timings. We demonstrate, using two real world data-sets, that a user can obtain more up-to-date information using our approach. 1

3 0.66036582 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

Author: Matthew Richardson, Pedro Domingos

Abstract: The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve PageRank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human-rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines. 1

4 0.54762185 51 nips-2001-Cobot: A Social Reinforcement Learning Agent

Author: Charles Lee Isbell Jr., Christian R. Shelton

Abstract: We report on the use of reinforcement learning with Cobot, a software agent residing in the well-known online community LambdaMOO. Our initial work on Cobot (Isbell et al.2000) provided him with the ability to collect social statistics and report them to users. Here we describe an application of RL allowing Cobot to take proactive actions in this complex social environment, and adapt behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users, Cobot learned nontrivial preferences for a number of users, modifing his behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment. 1

5 0.51687646 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

Author: John C. Platt, Christopher J. C. Burges, Steven Swenson, Christopher Weare, Alice Zheng

Abstract: This paper presents AutoDJ: a system for automatically generating music playlists based on one or more seed songs selected by a user. AutoDJ uses Gaussian Process Regression to learn a user preference function over songs. This function takes music metadata as inputs. This paper further introduces Kernel Meta-Training, which is a method of learning a Gaussian Process kernel from a distribution of functions that generates the learned function. For playlist generation, AutoDJ learns a kernel from a large set of albums. This learned kernel is shown to be more effective at predicting users’ playlists than a reasonable hand-designed kernel.

6 0.3948991 171 nips-2001-Spectral Relaxation for K-means Clustering

7 0.38936713 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

8 0.38159525 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

9 0.32711843 53 nips-2001-Constructing Distributed Representations Using Additive Clustering

10 0.32711357 78 nips-2001-Fragment Completion in Humans and Machines

11 0.32563618 107 nips-2001-Latent Dirichlet Allocation

12 0.30838129 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

13 0.29313463 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines

14 0.26661626 185 nips-2001-The Method of Quantum Clustering

15 0.26105225 169 nips-2001-Small-World Phenomena and the Dynamics of Information

16 0.21558464 30 nips-2001-Agglomerative Multivariate Information Bottleneck

17 0.21237941 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation

18 0.19751352 35 nips-2001-Analysis of Sparse Bayesian Learning

19 0.17449833 114 nips-2001-Learning from Infinite Data in Finite Time

20 0.17233096 95 nips-2001-Infinite Mixtures of Gaussian Process Experts


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.255), (14, 0.021), (17, 0.058), (19, 0.042), (27, 0.096), (30, 0.05), (38, 0.018), (59, 0.033), (72, 0.06), (79, 0.066), (91, 0.193)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88389498 24 nips-2001-Active Information Retrieval

Author: Tommi Jaakkola, Hava T. Siegelmann

Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1

2 0.67689025 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

Author: Jens Kohlmorgen, Steven Lemm

Abstract: We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, our method processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changing number of states and an on-line variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incoming data in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream. The usefulness of the algorithm is demonstrated by an application to a switching dynamical system. 1

3 0.67033082 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

Author: Roni Khardon, Dan Roth, Rocco A. Servedio

Abstract: We study online learning in Boolean domains using kernels which capture feature expansions equivalent to using conjunctions over basic features. We demonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability of the resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. We also consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow’s behavior for learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.

4 0.6694383 183 nips-2001-The Infinite Hidden Markov Model

Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite— consider, for example, symbols being possible words appearing in English text.

5 0.66285497 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

Author: Aaron C. Courville, David S. Touretzky

Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1

6 0.6623168 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

7 0.66123384 55 nips-2001-Convergence of Optimistic and Incremental Q-Learning

8 0.66121322 89 nips-2001-Grouping with Bias

9 0.66086781 144 nips-2001-Partially labeled classification with Markov random walks

10 0.66023338 13 nips-2001-A Natural Policy Gradient

11 0.65946287 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

12 0.65793419 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation

13 0.65682054 36 nips-2001-Approximate Dynamic Programming via Linear Programming

14 0.65672028 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

15 0.6544221 68 nips-2001-Entropy and Inference, Revisited

16 0.6533792 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

17 0.65198791 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

18 0.65154427 3 nips-2001-ACh, Uncertainty, and Cortical Inference

19 0.64959979 169 nips-2001-Small-World Phenomena and the Dynamics of Information

20 0.649544 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds