nips nips2008 nips2008-169 knowledge-graph by maker-knowledge-mining

169 nips-2008-Online Models for Content Optimization

Source: pdf

Author: Deepak Agarwal, Bee-chung Chen, Pradheep Elango, Nitin Motgi, Seung-taek Park, Raghu Ramakrishnan, Scott Roy, Joe Zachariah

Abstract: We describe a new content publishing system that selects articles to serve to a user, choosing from an editorially programmed pool that is frequently refreshed. It is now deployed on a major Yahoo! portal, and selects articles to serve to hundreds of millions of user visits per day, signiﬁcantly increasing the number of user clicks over the original manual approach, in which editors periodically selected articles to display. Some of the challenges we face include a dynamic content pool, short article lifetimes, non-stationary click-through rates, and extremely high trafﬁc volumes. The fundamental problem we must solve is to quickly identify which items are popular (perhaps within different user segments), and to exploit them while they remain current. We must also explore the underlying pool constantly to identify promising alternatives, quickly discarding poor performers. Our approach is based on tracking per article performance in near real time through online models. We describe the characteristics and constraints of our application setting, discuss our design choices, and show the importance and effectiveness of coupling online models with a randomization procedure. We discuss the challenges encountered in a production online content-publishing environment and highlight issues that deserve careful attention. Our analysis of this application also suggests a number of future research avenues. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 701 First Avenue Sunnyvale, CA 94089 Abstract We describe a new content publishing system that selects articles to serve to a user, choosing from an editorially programmed pool that is frequently refreshed. [sent-3, score-0.783]

2 portal, and selects articles to serve to hundreds of millions of user visits per day, signiﬁcantly increasing the number of user clicks over the original manual approach, in which editors periodically selected articles to display. [sent-5, score-1.568]

3 Some of the challenges we face include a dynamic content pool, short article lifetimes, non-stationary click-through rates, and extremely high trafﬁc volumes. [sent-6, score-0.632]

4 Our approach is based on tracking per article performance in near real time through online models. [sent-9, score-0.51]

5 Manual programming of content ensures high quality and maintains the editorial “voice” (the typical mix of content) that users associate with the site. [sent-18, score-0.363]

6 On the other hand, it is expensive to scale as the number of articles and the number of site pages we wish to program grow. [sent-19, score-0.484]

7 The usual machine-learning approach to ranking articles shown to users uses feature-based models, trained using ”ofﬂine data” (data collected in the past). [sent-23, score-0.589]

8 After making a signiﬁcant effort of feature engineering by looking at user demogrpahics, past activities on the site, various article categories, keywords and entities in articles, etc. [sent-24, score-0.614]

9 Our content pool is small but changing rapidly; article lifetimes are short; and there is wide variability in article performance sharing a common set of feature values. [sent-26, score-1.074]

10 Slot F1, which accounts for a large fraction of clicks, is prominent, and an article displayed on F1 receives many more clicks than when it is displayed at F2, F3 or F4. [sent-34, score-0.515]

11 The pool of available articles is created by trained human editors, and refreshed continually. [sent-35, score-0.533]

12 At any point in time, there are 16 live articles in the pool. [sent-36, score-0.503]

13 A few new articles programmed by editors get pushed into the system periodically (every few hours) and replace some old articles. [sent-37, score-0.548]

14 , breaking news) and eliminate irrelevant and fading stories, and ensure that the pool of articles is consistent with the “voice” of the site (i. [sent-40, score-0.563]

15 There is no personalization in the editorially programmed system; at a given time, the same articles are seen by all users visiting the page. [sent-43, score-0.711]

16 We consider how to choose the best set of four articles to display on the module to a given user. [sent-44, score-0.502]

17 , we focus on choosing articles to maximize overall click-through rate (CTR), which is the total number of clicks divided by total number of views for a time interval. [sent-46, score-0.591]

18 We found that fast reaction to user feedback through dynamic models based on clicks and page views is crucial for good performance. [sent-50, score-0.453]

19 We discuss an alternate and commonly pursued approach of ranking articles based on ofﬂine feature-driven models in Section 2. [sent-51, score-0.502]

20 • Scalability: The portal receives thousands of page views per second and serves hundreds of millions of user visits per day. [sent-53, score-0.386]

21 Data collection, model training and article scoring (using the model) are subject to tight latency requirements. [sent-54, score-0.406]

22 For instance, we only get a few milliseconds to decide on the appropriate content to show to a user visiting the portal. [sent-55, score-0.374]

23 1 Events (users’ clicks and page views) are collected from a large number of front-end web servers and continuously transferred to data collection clusters, which support event buffering to handle the time lag between the user viewing a page and then clicking articles on the page. [sent-57, score-0.948]

24 (a) shows the article’s CTR decay when shown continuously in a bucket at position F1; (b) shows the article’s CTR in the random bucket. [sent-63, score-0.347]

25 2 Machine Learning Challenges A serving scheme is an automated or manual algorithm that decides which article to show at different positions of our module for a given user. [sent-65, score-0.782]

26 Prior to our system, articles were chosen by human editors; we refer to this as the editorial serving scheme. [sent-66, score-0.812]

27 We tried the usual approach of building ofﬂine models based on retrospective data collected while using the editorial serving scheme. [sent-69, score-0.521]

28 For articles, we used features based on URL, article category (e. [sent-71, score-0.406]

29 The reasons include a wide variability in CTR for articles having a same set of feature values, dramatical changes of article CTR over time, and the fact that retrospective data collected from non-randomized serving schemes are confounded with factors that are hard to adjust for (see Section 5). [sent-75, score-1.258]

30 Also, our initial studies revealed high variability in CTRs for articles sharing some common features (e. [sent-76, score-0.454]

31 Non-stationary CTRs: The CTR of an article is strongly dependent on the serving scheme used (especially, how much F1 exposure it receives) and it may change dramatically over time. [sent-81, score-0.726]

32 In order to ensure webpage stability, we consider serving schemes that don’t alter the choice of what to show a user in a given slot until a better choice is identiﬁed. [sent-83, score-0.47]

33 Figure 1 (a) shows the CTR curve of a typical article subject to such a serving scheme. [sent-84, score-0.646]

34 The decay is due to users getting exposed to an article more than once. [sent-85, score-0.54]

35 Exposure to an article happens in different ways and to different degrees. [sent-86, score-0.406]

36 A user may get exposed to an article when he/she sees a descriptive link, or clicks on it and reads the article. [sent-87, score-0.723]

37 A user may also click multiple “see also” links associated with each article which may perhaps be a stronger form of exposure. [sent-88, score-0.659]

38 A user may be looking at the Weather module when visiting the portal or he may have looked at the article title in the link and not liked it. [sent-91, score-0.773]

39 For instance, not showing an article to a user after one page view containing the link may be suboptimal since he may have overlooked the link and may click on it later. [sent-93, score-0.701]

40 In fact, a large number of clicks on articles occur after the ﬁrst page view and depends on how a user navigates the portal. [sent-94, score-0.813]

41 Instead of solving the problem by imposing serving constraints per user, we build a component in our dynamic model that tracks article CTR decay over time. [sent-95, score-0.78]

42 We still impose reasonable serving constraints to provide good user experience—we do not show the same article to a user x minutes (x = 60 worked well) after he/she ﬁrst saw the article. [sent-96, score-1.083]

43 In addition to decay, the CTR of an article also changes by time of day and day of week. [sent-97, score-0.49]

44 Figure 1 (b) shows the CTR of a typical article when served using a randomized serving scheme (articles served in a round-robin fashion to a randomly chosen user population). [sent-98, score-0.962]

45 It is evident that CTRs of articles vary dramatically over time, this clearly shows the need to adjust for time effects (e. [sent-100, score-0.454]

46 , diurnal patterns, decay) to obtain an adjusted article score when deciding to rank articles. [sent-102, score-0.428]

47 In our current study, we ﬁtted a global time of day curve at 5 minute resolution to data obtained from randomized serving scheme through a periodic (weekly) adaptive regression spline. [sent-103, score-0.326]

48 However, there are still interactions that occur at an article level which were difﬁcult to estimate ofﬂine through article features. [sent-104, score-0.812]

49 Per-article online models that put more weight on recent observations provide an effective self adaptive mechanism to automatically account for deviations from the global trend when an article is pushed into the system. [sent-105, score-0.511]

50 Strong Serving Bias: A model built using data generated from a serving scheme is biased by that scheme. [sent-106, score-0.307]

51 For example, if a serving scheme decides not to show article A to any user, any model built using this data would not learn the popularity of A from users’ feedback. [sent-107, score-0.713]

52 Moreover, every non-randomized serving scheme introduces confounding factors in the data; adjusting such factors to obtain unbiased article scores is often difﬁcult. [sent-110, score-0.71]

53 In fact, early experiments that updated models using data from editorial bucket to serve in our experimental buckets performed poorly. [sent-111, score-0.443]

54 Interaction with the editorial team: The project involved considerable interaction with human editors who have been manually and successfully placing articles on the portal for many years. [sent-113, score-0.686]

55 Understanding how that experience can be leveraged in conjuction with automated serving schemes was a major challenge, both technically and culturally (in that editors had to learn what ML algorithms could do, and we had to learn all the subtle considerations in what to show). [sent-114, score-0.319]

56 The result is our framework, wherein editors control the pool and set policies via constraints on what can be served, and the serving algorithm chooses what to show on a given user visit. [sent-115, score-0.583]

57 3 Experimental Setup and Framework Experimental Setup: We created several mutually exclusive buckets of roughly equal sizes from a fraction of live trafﬁc, and served trafﬁc in each bucket using one of our candidate serving schemes. [sent-116, score-0.593]

58 We also created a control bucket that ran the editorial serving scheme. [sent-118, score-0.601]

59 In addition, we created a separate bucket called the random bucket, which serves articles per visit in a round-robin fashion. [sent-119, score-0.717]

60 For instance, a breaking news story is shown immediately at the F1 position, a user visiting the portal after 60 minutes should not see the same article he saw during his earlier visits, etc. [sent-124, score-0.776]

61 • Online models to track CTR: We build online models to track article CTR for various user segments separately in each bucket. [sent-126, score-0.88]

62 Articles that are currently the best are shown in the serving bucket; others are explored in the random bucket until their scores are better than the current best articles; at this point they get promoted to the serving bucket. [sent-127, score-0.723]

63 In our serving bucket, we serve the same article at the F1 position in a given 5-minute window (except for overrides by business rules). [sent-128, score-0.749]

64 Separate online models are tracked for articles at the F1 position in each bucket. [sent-129, score-0.594]

65 Articles promoted to the F1 position in the serving bucket are subsequently scored by their model in the serving bucket; articles not playing at the F1 position in the serving bucket are of course scored by their model in the random bucket. [sent-130, score-1.73]

66 • Explore/Exploit Strategy: The random bucket is used for two purposes: (a) It provides a simple explore-exploit strategy that does random exploration with a small probability P , and serves best articles ranked by their estimated scores from online models with a large probability 1 − P . [sent-131, score-0.802]

67 , diurnal CTR pattern) without the need to build elaborate statistical models that adjust for serving bias in other non-random buckets. [sent-134, score-0.315]

68 ) 4 Online Models Tracking article CTR in an online fashion is a well studied area in time series with several methods available in the literature [3][7]; but the application to content optimization has not been carefully studied. [sent-139, score-0.618]

69 1 Estimated Most Popular: EMP This model tracks the log-odds of CTR per article at the F1 position over time but does not use any user features. [sent-142, score-0.649]

70 The subscript t in our notation refers to the tth interval after the article is ﬁrst displayed in the bucket. [sent-143, score-0.406]

71 In our scenario, we get roughly 300 − 400 observations at the F1 position per article in a 5-minute interval in the random bucket, hence the above transformation is appropriate for EMP and SS with few tens of user segments. [sent-149, score-0.649]

72 Model parameters are initialized by observing values at t = 1 for an article in random bucket, actual tracking begins at t = 2. [sent-154, score-0.432]

73 In particular, user covariates are used to create disjoint subsets (segments), a local EMP model is built to track item performance in each user segment. [sent-164, score-0.461]

74 For a small number of user segments, we ﬁt a separate EMP model per user segment for a given item at the F1 position. [sent-165, score-0.446]

75 As the number of user segments grows, data sparseness may lead to high variance estimates in small segments, especially during early part of article lifetime. [sent-166, score-0.678]

76 To address this, we smooth article scores in segments at each time point through a Bayesian hierarchical model. [sent-167, score-0.47]

77 In particular, if (ait , Qit ), i = 1, · · · , k, are predicted mean and variances of item score at F1 in k different user segments at time t, we derive a new score as follows: a˜ = it τ Qit ait + at ¯ τ + Qit τ + Qit (2) where at is the EMP model score for the item. [sent-168, score-0.292]

78 5 Experiments In this section, we present the results of experiments that compare serving schemes based on our three online models (EMP, SS, and OLR) with the current editorial programming approach (which we refer to as ED). [sent-179, score-0.485]

79 We show our online models signiﬁcantly outperform ED based on bucket testsing the four alternatives concurrently on live trafﬁc on the portal over a month. [sent-180, score-0.476]

80 Then, by ofﬂine analysis, we identiﬁed the reason personalization (based on user features in OLR or segmentation in SS) did not provide improvement—it is mainly because we did not have sufﬁciently diverse articles to exploit, although SS and OLR are more predictive models than EMP. [sent-181, score-0.782]

81 We used each of these schemes to serve trafﬁc in a bucket (a random set of users) for one month; the four buckets ran concurrently. [sent-184, score-0.32]

82 We also obtained signiﬁcant improvements relative to round-robin serving scheme in the random bucket but do not report it to avoid clutter. [sent-186, score-0.527]

83 (c) on the x-axis is the CTR of a polar article in a segment, on the y-axis is the CTR of the global best article (during the polar article’s lifetime) in the same segment. [sent-212, score-0.964]

84 Analysis of Personalization: We did not expect to ﬁnd that personalization to user segments provided no additional CTR lift relative to EMP despite the fact that user features were predictive of article CTR. [sent-216, score-1.045]

85 A closer look at the data revealed the main cause to be the current editorial content generation process, which is geared to create candidate articles that are expected to be popular for all users (not for different user segments). [sent-217, score-1.0]

86 In fact, there were articles that have more afﬁnity to some user segments than others—we deﬁne these to be articles whose CTR in a given segment was at least twice the article’s overall CTR, and refer to them as polar articles. [sent-218, score-1.286]

87 However, whenever polar articles were in the pool of candidate articles, there was usually a non-polar one in the pool that was more popular than the polar ones across all segments. [sent-219, score-0.764]

88 Figure 2 (c) shows, for the gender segmentation, that polar articles almost always co-exist with an article whose overall CTR is greater than even the CTR in the segment of the polar article. [sent-221, score-1.087]

89 For the AgeXGender segmentation, the global-best article was the same as the segment-best article about 65% of the intervals; the maximum expected CTR lift over global ranking was only about 1%. [sent-222, score-0.923]

90 Bucket Tests: It is common practice in existing literature to evaluate a new serving algorithm using predictive metrics obtained by running the algorithm on retrospective data (collected while using another serving scheme). [sent-225, score-0.588]

91 This ﬁnding underscores the need for random bucket data, effective techniques to correct the bias, or a rapid bucket testing infrastructure to compare serving schemes. [sent-231, score-0.77]

92 6 Related Work Google News personalization [13], which uses collaborative ﬁltering to provide near real-time recommendation of news articles to users, is the most closely related prior work. [sent-232, score-0.574]

93 On the one hand, this allows us to build per-article models in near real-time; on the other, the editorially controlled mix of items means all articles are of high quality (making it hard to achieve lift by simply eliminating bad articles). [sent-234, score-0.656]

94 However, we note that many of the assumptions made in the classical multi-armed bandit and reinforcement learning literature are not satisﬁed in our setting (dynamic set of articles, short article lifetime, batch-learning, non-stationary CTR, lagged response). [sent-242, score-0.436]

95 In fact, short article lifetimes, dynamism of the content pool and the importance of learning article behaviour very quickly are the major challenges in our scenario. [sent-243, score-1.099]

96 7 Discussion In this paper, we described content optimization, the problem of selecting articles to present to a user who is intent on browsing for information. [sent-249, score-0.796]

97 Examples include recommending RSS feeds or articles from one or more RSS feeds, such as Google’s news aggregation, and segmentation and personalization are likely to be effective. [sent-252, score-0.618]

98 The variant that we addressed involves selecting from a small, homogeneous set of articles; segmentation may not be effective unless the pool of articles is chosen to be diverse, and there is a high premium in quickly estimating and tracking popularity per-article. [sent-253, score-0.604]

99 Our work suggests ofﬂine feature based models are not good enough to rank articles in a highly dynamic content publishing system where article pools are small, dynamic and of high quality; lifetimes are short; and the utility metric being measured (e. [sent-254, score-1.197]

100 In fact, the biased nature of data obtained from a non-randomized serving scheme also underscores the need to obtain some percentage of data from a randomized experimental design. [sent-257, score-0.306]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('articles', 0.454), ('ctr', 0.422), ('article', 0.406), ('bucket', 0.243), ('serving', 0.24), ('user', 0.208), ('emp', 0.167), ('olr', 0.135), ('content', 0.134), ('editorial', 0.118), ('clicks', 0.109), ('retrospective', 0.108), ('ss', 0.105), ('lift', 0.09), ('users', 0.086), ('ine', 0.08), ('pool', 0.079), ('portal', 0.079), ('online', 0.078), ('polar', 0.076), ('traf', 0.072), ('personalization', 0.069), ('segments', 0.064), ('ctrs', 0.056), ('challenges', 0.053), ('news', 0.051), ('lifetimes', 0.049), ('live', 0.049), ('module', 0.048), ('decay', 0.048), ('gender', 0.045), ('click', 0.045), ('qit', 0.045), ('scheme', 0.044), ('page', 0.042), ('day', 0.042), ('dynamic', 0.039), ('programmed', 0.036), ('exposure', 0.036), ('editors', 0.035), ('position', 0.035), ('clickers', 0.034), ('editorially', 0.034), ('lifts', 0.034), ('age', 0.033), ('visiting', 0.032), ('served', 0.032), ('yahoo', 0.032), ('nt', 0.03), ('bandit', 0.03), ('site', 0.03), ('segment', 0.03), ('deployed', 0.029), ('pools', 0.029), ('buckets', 0.029), ('randomization', 0.029), ('visits', 0.029), ('collected', 0.028), ('views', 0.028), ('models', 0.027), ('ct', 0.027), ('voice', 0.027), ('lifetime', 0.027), ('tracking', 0.026), ('agarwal', 0.026), ('serve', 0.026), ('build', 0.026), ('mix', 0.025), ('segmentation', 0.024), ('ads', 0.024), ('periodically', 0.023), ('built', 0.023), ('clicking', 0.022), ('contentcategory', 0.022), ('diurnal', 0.022), ('elango', 0.022), ('entertainment', 0.022), ('infrastructure', 0.022), ('overrides', 0.022), ('servers', 0.022), ('slots', 0.022), ('topicality', 0.022), ('underscores', 0.022), ('manual', 0.022), ('automated', 0.022), ('track', 0.022), ('schemes', 0.022), ('constraints', 0.021), ('continuously', 0.021), ('quickly', 0.021), ('ranking', 0.021), ('visit', 0.02), ('conducting', 0.02), ('feeds', 0.02), ('ait', 0.02), ('business', 0.02), ('confounding', 0.02), ('portals', 0.02), ('publishing', 0.02), ('rss', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 169 nips-2008-Online Models for Content Optimization

Author: Deepak Agarwal, Bee-chung Chen, Pradheep Elango, Nitin Motgi, Seung-taek Park, Raghu Ramakrishnan, Scott Roy, Joe Zachariah

2 0.091395058 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising

Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer

Abstract: We address the problem of learning classiﬁers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reﬂect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1

3 0.0791751 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging

Author: Ofer Dekel

Abstract: We present cutoff averaging, a technique for converting any conservative online learning algorithm into a batch learning algorithm. Most online-to-batch conversion techniques work well with certain types of online learning algorithms and not with others, whereas cutoff averaging explicitly tries to adapt to the characteristics of the online algorithm being converted. An attractive property of our technique is that it preserves the efﬁciency of the original online algorithm, making it appropriate for large-scale learning problems. We provide a statistical analysis of our technique and back our theoretical claims with experimental results. 1

4 0.065047771 184 nips-2008-Predictive Indexing for Fast Search

Author: Sharad Goel, John Langford, Alexander L. Strehl

Abstract: We tackle the computational problem of query-conditioned search. Given a machine-learned scoring rule and a query distribution, we build a predictive index by precomputing lists of potential results sorted based on an expected score of the result over future queries. The predictive index datastructure supports an anytime algorithm for approximate retrieval of the top elements. The general approach is applicable to webpage ranking, internet advertisement, and approximate nearest neighbor search. It is particularly effective in settings where standard techniques (e.g., inverted indices) are intractable. We experimentally ﬁnd substantial improvement over existing methods for internet advertisement and approximate nearest neighbors. 1

5 0.049501106 99 nips-2008-High-dimensional support union recovery in multivariate regression

Author: Guillaume R. Obozinski, Martin J. Wainwright, Michael I. Jordan

Abstract: We study the behavior of block 1 / 2 regularization for multivariate regression, where a K-dimensional response vector is regressed upon a ﬁxed set of p covariates. The problem of support union recovery is to recover the subset of covariates that are active in at least one of the regression problems. Studying this problem under high-dimensional scaling (where the problem parameters as well as sample size n tend to inﬁnity simultaneously), our main result is to show that exact recovery is possible once the order parameter given by θ 1 / 2 (n, p, s) : = n/[2ψ(B ∗ ) log(p − s)] exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the size of the union of supports, and ψ(B ∗ ) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefﬁcient vectors that constitute the model. This sparsity-overlap function reveals that block 1 / 2 regularization for multivariate regression never harms performance relative to a naive 1 -approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal relative to the design. We complement our theoretical results with simulations that demonstrate the sharpness of the result, even for relatively small problems. 1

6 0.047698159 140 nips-2008-Mortal Multi-Armed Bandits

7 0.046501461 168 nips-2008-Online Metric Learning and Fast Similarity Search

8 0.042509403 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

9 0.041004114 22 nips-2008-An Online Algorithm for Maximizing Submodular Functions

10 0.037230819 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

11 0.036620442 214 nips-2008-Sparse Online Learning via Truncated Gradient

12 0.036113448 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

13 0.035307623 147 nips-2008-Multiscale Random Fields with Application to Contour Grouping

14 0.035280328 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions

15 0.033935014 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

16 0.03375303 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

17 0.033046115 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

18 0.031108622 73 nips-2008-Estimating Robust Query Models with Convex Optimization

19 0.030316502 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

20 0.030187702 101 nips-2008-Human Active Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.099), (1, 0.018), (2, -0.005), (3, -0.033), (4, -0.048), (5, -0.02), (6, 0.003), (7, 0.025), (8, 0.044), (9, 0.02), (10, -0.058), (11, 0.022), (12, -0.05), (13, 0.001), (14, 0.072), (15, -0.036), (16, -0.012), (17, 0.02), (18, 0.043), (19, 0.048), (20, 0.034), (21, 0.024), (22, 0.032), (23, 0.014), (24, 0.015), (25, 0.016), (26, 0.032), (27, 0.047), (28, 0.027), (29, 0.051), (30, -0.038), (31, 0.02), (32, 0.013), (33, -0.014), (34, -0.026), (35, -0.03), (36, 0.076), (37, 0.005), (38, 0.148), (39, -0.112), (40, -0.005), (41, 0.07), (42, 0.062), (43, -0.003), (44, -0.091), (45, -0.055), (46, 0.041), (47, 0.044), (48, -0.025), (49, 0.119)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94227308 169 nips-2008-Online Models for Content Optimization

Author: Deepak Agarwal, Bee-chung Chen, Pradheep Elango, Nitin Motgi, Seung-taek Park, Raghu Ramakrishnan, Scott Roy, Joe Zachariah

2 0.52635592 111 nips-2008-Kernel Change-point Analysis

Author: Zaïd Harchaoui, Eric Moulines, Francis R. Bach

Abstract: We introduce a kernel-based method for change-point analysis within a sequence of temporal observations. Change-point analysis of an unlabelled sample of observations consists in, ﬁrst, testing whether a change in the distribution occurs within the sample, and second, if a change occurs, estimating the change-point instant after which the distribution of the observations switches from one distribution to another different distribution. We propose a test statistic based upon the maximum kernel Fisher discriminant ratio as a measure of homogeneity between segments. We derive its limiting distribution under the null hypothesis (no change occurs), and establish the consistency under the alternative hypothesis (a change occurs). This allows to build a statistical hypothesis testing procedure for testing the presence of a change-point, with a prescribed false-alarm probability and detection probability tending to one in the large-sample setting. If a change actually occurs, the test statistic also yields an estimator of the change-point location. Promising experimental results in temporal segmentation of mental tasks from BCI data and pop song indexation are presented. 1

3 0.52371079 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

Author: Duy Nguyen-tuong, Jan R. Peters, Matthias Seeger

Abstract: Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other non-parametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and ν-SVR. 1

4 0.46942544 22 nips-2008-An Online Algorithm for Maximizing Submodular Functions

Author: Matthew Streeter, Daniel Golovin

Abstract: We present an algorithm for solving a broad class of online resource allocation problems. Our online algorithm can be applied in environments where abstract jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs near-optimally according to two natural metrics: (i) the fraction of jobs completed within time T , for some ﬁxed deadline T > 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1

5 0.46795326 168 nips-2008-Online Metric Learning and Fast Similarity Search

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon, Kristen Grauman

Abstract: Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their ofﬂine counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efﬁcient updates to data structures used for fast approximate similarity search. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.

6 0.46114969 41 nips-2008-Breaking Audio CAPTCHAs

7 0.42059094 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging

8 0.40798044 211 nips-2008-Simple Local Models for Complex Dynamical Systems

9 0.4072198 73 nips-2008-Estimating Robust Query Models with Convex Optimization

10 0.40701321 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

11 0.39673558 184 nips-2008-Predictive Indexing for Fast Search

12 0.38271075 214 nips-2008-Sparse Online Learning via Truncated Gradient

13 0.3725526 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

14 0.36944336 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

15 0.36079365 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

16 0.34738427 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising

17 0.34115949 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

18 0.33481392 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

19 0.32101566 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method

20 0.31718668 236 nips-2008-The Mondrian Process

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.048), (7, 0.048), (12, 0.032), (15, 0.021), (22, 0.329), (28, 0.149), (57, 0.037), (59, 0.021), (63, 0.024), (71, 0.016), (77, 0.04), (78, 0.048), (83, 0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77610958 169 nips-2008-Online Models for Content Optimization

Author: Deepak Agarwal, Bee-chung Chen, Pradheep Elango, Nitin Motgi, Seung-taek Park, Raghu Ramakrishnan, Scott Roy, Joe Zachariah

2 0.65434653 81 nips-2008-Extracting State Transition Dynamics from Multiple Spike Trains with Correlated Poisson HMM

Author: Kentaro Katahira, Jun Nishikawa, Kazuo Okanoya, Masato Okada

Abstract: Neural activity is non-stationary and varies across time. Hidden Markov Models (HMMs) have been used to track the state transition among quasi-stationary discrete neural states. Within this context, independent Poisson models have been used for the output distribution of HMMs; hence, the model is incapable of tracking the change in correlation without modulating the ﬁring rate. To achieve this, we applied a multivariate Poisson distribution with correlation terms for the output distribution of HMMs. We formulated a Variational Bayes (VB) inference for the model. The VB could automatically determine the appropriate number of hidden states and correlation types while avoiding the overlearning problem. We developed an efﬁcient algorithm for computing posteriors using the recursive relationship of a multivariate Poisson distribution. We demonstrated the performance of our method on synthetic data and a real spike train recorded from a songbird. 1

3 0.50104797 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

Author: Liu Yang, Rong Jin, Rahul Sukthankar

Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classiﬁcation. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classiﬁcation accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classiﬁcation tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a signiﬁcant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.

4 0.49857908 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Author: Francis R. Bach

Abstract: For supervised and unsupervised learning, positive deﬁnite kernels allow to use large and potentially inﬁnite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efﬁciently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

5 0.49738011 194 nips-2008-Regularized Learning with Networks of Features

Author: Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar

Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature’s weight is penalized by the amount it differs from the average weight of its neighbors. For text classiﬁcation, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge signiﬁcantly improve prediction accuracy. 1

6 0.49582472 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

7 0.49537709 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

8 0.49473745 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

9 0.49448964 196 nips-2008-Relative Margin Machines

10 0.49393499 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

11 0.49389264 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

12 0.49365586 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

13 0.49355367 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

14 0.49205652 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

15 0.49183002 231 nips-2008-Temporal Dynamics of Cognitive Control

16 0.49161035 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

17 0.4913882 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models

18 0.49026194 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

19 0.4897823 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

20 0.48898113 32 nips-2008-Bayesian Kernel Shaping for Learning Control