nips nips2010 nips2010-198 knowledge-graph by maker-knowledge-mining

198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

Source: pdf

Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola

Abstract: We present a fast online solver for large scale parametric max-ﬂow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-ﬂow problems by performing stochastic gradient descent on a set of ﬂows. We apply the algorithm to optimize tier arrangement of over 84 million web pages on a layered set of caches to serve an incoming query stream optimally. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 org Abstract We present a fast online solver for large scale parametric max-ﬂow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. [sent-11, score-0.132]

2 Our algorithm solves an integer linear program in an online fashion. [sent-12, score-0.139]

3 We apply the algorithm to optimize tier arrangement of over 84 million web pages on a layered set of caches to serve an incoming query stream optimally. [sent-15, score-0.778]

4 Our motivation for solving parametric ﬂow is the problem of webpage tiering for search engine indices. [sent-20, score-0.404]

5 While our methods are entirely general and could be applied to a range of other machine learning and optimization problems, we focus on webpage tiering as the illustrative example in this paper. [sent-21, score-0.264]

6 The speciﬁc problem that will provide our running example is that of assigning webpages to several tiers of a search engine cache such that the time to serve a query is minimized. [sent-26, score-0.71]

7 For a given query, a search engine returns a number of documents (typically 10). [sent-27, score-0.248]

8 The time it takes to serve a query depends on where the documents are located. [sent-28, score-0.294]

9 The ﬁrst tier (or cache) is the fastest (using premium hardware, etc. [sent-29, score-0.53]

10 thus also often the smallest) and retrieves its documents with little latency. [sent-30, score-0.128]

11 If even just a single document is located in a back tier, the delay is considerably increased since now we need to search the larger (and slower) tiers until the desired document is found. [sent-31, score-0.484]

12 Hence it is our goal to assign the most popular documents to the fastest tiers while taking the interactions between documents into account. [sent-32, score-0.57]

13 1 2 The Tiering Problem We would like to allocate documents d ∈ D into k tiers of storage at our disposal. [sent-33, score-0.441]

14 Moreover, let q ∈ Q be the queries arriving at a search engine, with ﬁnite values vq > 0 (e. [sent-34, score-0.258]

15 the probability of the query, possibly weighted by the relevance of the retrieved results), and a set of documents Dq retrieved for the query. [sent-36, score-0.246]

16 This input structure is stored in a bipartite graph G with vertices V = D ∪ Q and edges (d, q) ∈ E whenever document d should be retrieved for query q. [sent-37, score-0.494]

17 The k tiers, with tier 1 as the most desirable and k the least (most costly for retrieval), form an increasing sequence of cummulative capacities Ct , with Ct indicating how many pages can be stored by tiers t ≤ t together. [sent-38, score-0.944]

18 Without loss of generality, assume Ck−1 < |D| (that is, the last tier is required to hold all documents, or the problem can be reduced). [sent-39, score-0.53]

19 Finally, for each t ≥ 2 we assume that there is a penalty pt−1 > 0 incurred by a tier-miss at level t (known as “fallthrough” from tier t − 1 to tier t). [sent-40, score-1.06]

20 And since we have to access tier 1 regardless, we set p0 = 0 for convenience. [sent-41, score-0.53]

21 For instance, retrieving a page in tier 3 incurs a total penalty of p1 + p2 . [sent-42, score-0.594]

22 Much work has been invested into building efﬁcient inverted indices which are optimized for query processing [17, 3]. [sent-45, score-0.219]

23 These papers all deal with the issue of optimizing the data representation for a given query and how an inverted index should be stored and managed for general queries. [sent-46, score-0.275]

24 A somewhat orthogonal strategy to this is to decompose the collection of webpages into a number of disjoint tiers [15] ordered in decreasing level of relevance. [sent-49, score-0.312]

25 That is, documents are partitioned according to their relevance for answering queries into different tiers of (typically) increasing size. [sent-50, score-0.519]

26 This leads to putting the most frequently retrieved or the most relevant (according to the value of query, the market or other operational parameters) pages into the top tier with the smallest latency and relegating the less frequently retrieved or the less relevant pages into bottom tiers. [sent-51, score-0.732]

27 A naive implementation of this approach would simply assign a value to each page in the index and arrange them such that the most frequently accessed pages reside in the highest levels of the cache. [sent-53, score-0.152]

28 Unfortunately this approach is suboptimal: in order to answer a given query well a search engine typically does not only return a single page as a result but rather returns a list of r (typically r = 10) pages. [sent-54, score-0.373]

29 This means that if even just one of these pages is found at a much lower tier, we either need to search the backtiers to retrieve this page or alternatively we need to sacriﬁce result relevance. [sent-55, score-0.138]

30 At ﬁrst glance, the problem is daunting: we need to take all correlations among pages induced by user queries into account. [sent-56, score-0.133]

31 We denote the result set for query q by Dq := {d : (d, q) ∈ G}, and similarly, the set of queries seeking for a document d by Qd := {q : (d, q) ∈ G}. [sent-65, score-0.344]

32 Deﬁne uq := max zd d∈Dq 2 (1) as the number of cache levels we need to traverse to answer query q. [sent-70, score-0.778]

33 In other words, it is the document found in the worst tier which determines the cost of access. [sent-71, score-0.606]

34 Integrating the optimization over uq we may formulate the tiering problem as an integer program: uq −1 minimize z,u pt subject to zd ≤ uq ≤ k for all (q, d) ∈ G and vq t=1 q∈Q {zd ≤ t} ≤ Ct ∀ t. [sent-72, score-1.075]

35 C) We have insufﬁcient data for an accurate tier assignment for pages associated with tail queries. [sent-83, score-0.585]

36 This can be addressed by a smoothing estimator for the tier index of a page. [sent-84, score-0.562]

37 3 Integer Linear Program We now replace the selector variables zd and uq by binary variables via a “thermometer” code. [sent-86, score-0.432]

38 Let x ∈ {0; 1} D×(k−1) subject to xdt ≥ xd,t+1 for all d, t (3a) Q×(k−1) y ∈ {0; 1} subject to yqt ≥ yq,t+1 for all q, t (3b) be index variables. [sent-87, score-0.59]

39 Thus we have the one-to-one mapping zd = 1 + t xdt and xdt = {zd > t} between z and x. [sent-88, score-0.994]

40 For instance, for k = 5, a middle tier z = 3 maps into x = (1, 1, 0, 0) (requiring two fallthroughs), and the best tier z = 1 corresponds to x = (0, 0, 0, 0). [sent-89, score-1.06]

41 The constraint uq ≥ zd can simply be rewritten coordinate-wise yqt ≥ xdt . [sent-91, score-0.969]

42 Finally, the capacity constraints assume the form d xdt ≥ |D| − Ct . [sent-92, score-0.503]

43 That is, the number of pages ¯ allocated to higher tiers are at least |D| − Ct . [sent-93, score-0.32]

44 The corresponding graph has vertices D and edges (d, d ) ∈ E, whenever d and d are displayed together to answer a query. [sent-106, score-0.166]

45 In this case the tiering problem reduces to one of ﬁnding a subset of vertices D ⊂ D such that the induced subgraph has the largest number (possibly weighted) of edges subject to the capacity constraint |D | ≤ C. [sent-107, score-0.572]

46 3 URL que ry ry que URL 3 Figure 1: k-densest subgraph reduction. [sent-111, score-0.165]

47 Convex Programming The key idea in solving (4) is to relax the capacity constraints for the tiers. [sent-115, score-0.182]

48 We replace the capacity constraint by a partial Lagrangian. [sent-117, score-0.176]

49 This does not ensure that we will be able to meet the capacity constraints exactly anymore. [sent-118, score-0.182]

50 Instead, we will only be able to state ex-post that the relaxed solution is optimal for the observed capacity distribution. [sent-119, score-0.189]

51 Moreover, we are still able to control capacity by a suitable choice of the associated Lagrange multipliers. [sent-120, score-0.153]

52 1 Linear Program Instead of solving (4) we study the linear program: minimize v yp − 1 xλ subject to xdt ≥ xd,t+1 and yqt ≥ yq,t+1 x,y (5) yqt ≥ xdt for (q, d) ∈ G and xdt , yqt ∈ [0, 1] Here λ = (λ1 , . [sent-122, score-1.564]

53 , λk−1 ) act as Lagrange multipliers λt ≥ 0 for enforcing capacity constraints and 1 denotes a column of |D| ones. [sent-125, score-0.182]

54 there ∗ ¯ exists some x∗ , y ∗ satisfying x∗ , yqt ∈ {0; 1} which minimize (5). [sent-129, score-0.193]

55 Moreover, we need to adjust λ such that we satisfy the desired capacity constraints (approximately). [sent-132, score-0.182]

56 Example 1 Consider the case of 2 tiers (hence we drop the index t), a single query q and 3 documents d. [sent-136, score-0.615]

57 Set the capacity constraint of the ﬁrst tier to 1. [sent-137, score-0.706]

58 In this case it is impossible to avoid a cache miss in the ILP. [sent-138, score-0.167]

59 This example shows why the optimal tiering problem is NP hard — it is possible to design cases where the tier assignment for a page is highly ambiguous. [sent-143, score-0.859]

60 Note that for the integer programming problem with capacity constraint C = 2 we could allocate an arbitrary pair of pages to the cache. [sent-144, score-0.25]

61 4 pages pages queries ∞ queries λ ∞ λ s (1-vq) (1-vq) t s t Figure 2: Left: maximum ﬂow problem for a problem of 4 pages and 3 queries. [sent-146, score-0.297]

62 The minimum cut of the directed graph needs to sever all pages leading to a query or alternatively it needs to sever the corresponding query incurring a penalty of (1 − vq ). [sent-147, score-0.602]

63 This is precisely the tiering objective function for the case of two tiers. [sent-148, score-0.241]

64 Here the black nodes and dashed edges represent a copy of the original graph — additionally each page in the original graph also has an inﬁnite-capacity link to the corresponding query in the additional graph. [sent-150, score-0.34]

65 2 Graph Cut Equivalence It is well known that the case of two tiers (k = 2) can be relaxed to a min-cut, max-ﬂow problem [7, 4]. [sent-152, score-0.289]

66 The transformation works by designing a bipartite graph between queries q and documents d. [sent-153, score-0.291]

67 All documents are connected to the source s by edges with capacity λ and queries are connected to the sink t with capacity (1 − vq ). [sent-154, score-0.712]

68 Documents d retrieved for a query q are connected to q with capacity ∞. [sent-155, score-0.378]

69 The conversion to several tiers is slightly more involved. [sent-157, score-0.289]

70 Denote by vdi vertices associated with document d and tier i and moreover, denote by wqi vertices associated with a query q and tier i. [sent-158, score-1.478]

71 Then the graph is given by edges (s, vdi ) with capacities λi ; edges (vdi , wqi ) for all (document, query) pairs and for all i ≤ i , endowed with inﬁnite capacity; and edges (wqi , t) with capacity (1 − vq ). [sent-159, score-0.587]

72 As with the simple caching problem, we need to impose a cut on any query edge for which not all incoming page edges have been cut. [sent-160, score-0.333]

73 The key difference is that in order to beneﬁt from storing pages in a better tier we need to guarantee that the page is contained in the lower tier, too. [sent-161, score-0.625]

74 Moreover, under reasonable conditions on the capacity constraints, there is more structure in λ. [sent-172, score-0.153]

75 Then any choice of λ satisfying the capacity constraints is monotonically non-increasing. [sent-174, score-0.182]

76 The function fλ is clearly convex, which helps describe our tiering problem via the following convex program minimize v z fλ (zd − 1) for zd ∈ [1, k] max zd + d∈Dq (8) d We now use only one variable per document. [sent-180, score-1.038]

77 The key observation is that the objective function of (8) can be written as sum over the following loss functions lq (z) := vq max zd + d∈Dq 1 |Q| fλ (zd − 1) (9) d where |Q| denotes the cardinality of the query set. [sent-187, score-0.652]

78 The transformation suggests a simple stochastic gradient descent optimization algorithm: traverse the input stream by queries, and update the values of xd of all those documents d that would need to move into the next tier in order to reduce service time for a query. [sent-188, score-0.797]

79 Algorithm 1 proceeds by processing the input query-result records (q, vq , Dq ) as a stream comprising the set of pages that need to be displayed to answer a given query. [sent-190, score-0.196]

80 More speciﬁcally, it updates the tier preferences of the pages that have the lowest tier scores for each level and it decrements the preferences for all other pages. [sent-191, score-1.12]

81 5 Deferred and Approximate Updates The naive implementation of algorithm 1 is infeasible as it would require us to update all |D| coordinates of xd for each query q. [sent-196, score-0.209]

82 However, it is possible to defer the updates until we need to inspect zd directly. [sent-197, score-0.381]

83 The key idea is to exploit that for all zd with d ∈ Dq the updates only depend on the value of zd at update time (Section A. [sent-198, score-0.733]

84 6 Path Following The tiering problem has the appealing property [19] that the solutions for increasing λ form a nested subset. [sent-201, score-0.269]

85 In other words, relaxing capacity constraints never demotes but only promotes pages. [sent-202, score-0.182]

86 Then ζd provides an order for sorting documents into tiers for the entire range [λ, λ ]. [sent-211, score-0.417]

87 This yields Algorithm 3 Path Following Initialize all (xdt ) = zd ∈ [1, k] for each λ ∈ Λ do Reﬁne variables xdt (λ) by Algorithm 1 using a small number of iterations. [sent-213, score-0.673]

88 end for Average the variables xdt = λ∈Λ xdt (λ)/|Λ| Sort the documents with the resulting total scores zd Fill the ordered documents to tier 1, then tier 2, etc. [sent-214, score-2.31]

89 + 2 The search results for any ﬁxed query vary for a variety of reasons, e. [sent-230, score-0.209]

90 We approximate the session graph by treating queries with different result sets as if they were different. [sent-233, score-0.186]

91 Session miss rate for the online procedure, the max and sum heuristics (A. [sent-237, score-0.152]

92 As seen, the max heuristic cannot be optimal for any but small cache sizes, but it performs comparably well to Online. [sent-240, score-0.133]

93 The ranking variant of our online Algorithm 3 (30 passes over the data) consistently outperforms the max and sum heuristics over a large span of cache sizes (Figure 3). [sent-243, score-0.275]

94 We then calculate the session miss rate of each procedure at any cache size, and report the relative improvement of our online algorithm as ratios of miss rates in Figure 3–Right. [sent-245, score-0.318]

95 5 million query-sessions per second (qps) for this version, and about 2 million qps for smaller problems (as they incur fewer memory page faults). [sent-248, score-0.14]

96 5 Discussion We showed that very large tiering and densest subset optimization problems can be solved efﬁciently by a relatively simple online optimization procedure (Some extensions are in Appendix B). [sent-251, score-0.334]

97 It came somewhat as a surprise that the max heuristic often works nearly as well as the optimal tiering solution. [sent-252, score-0.262]

98 Some readers may question the need for a static tiering solution, given that data could, in theory, be reassigned between different caching tiers on the ﬂy. [sent-254, score-0.566]

99 In addition to that, tiering is a problem not restricted to the provision of webpages. [sent-258, score-0.241]

100 Inverted index compression and query processing with optimized document ordering. [sent-383, score-0.274]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tier', 0.53), ('zd', 0.352), ('xdt', 0.321), ('tiers', 0.289), ('tiering', 0.241), ('yqt', 0.193), ('query', 0.166), ('capacity', 0.153), ('documents', 0.128), ('ct', 0.123), ('dq', 0.116), ('vq', 0.113), ('cache', 0.112), ('queries', 0.102), ('ow', 0.082), ('uq', 0.08), ('engine', 0.077), ('document', 0.076), ('capacities', 0.07), ('page', 0.064), ('retrieved', 0.059), ('miss', 0.055), ('subgraph', 0.053), ('inverted', 0.053), ('session', 0.049), ('program', 0.049), ('hewlett', 0.048), ('packard', 0.048), ('vdi', 0.048), ('wqi', 0.048), ('online', 0.047), ('parametric', 0.043), ('search', 0.043), ('xd', 0.043), ('integer', 0.043), ('portfolio', 0.042), ('pt', 0.041), ('vertices', 0.04), ('edges', 0.04), ('ward', 0.036), ('caching', 0.036), ('solution', 0.036), ('passes', 0.036), ('moreover', 0.036), ('graph', 0.035), ('ram', 0.033), ('laboratories', 0.033), ('lemma', 0.033), ('gus', 0.032), ('kostas', 0.032), ('qps', 0.032), ('sever', 0.032), ('unimodularity', 0.032), ('urls', 0.032), ('index', 0.032), ('lp', 0.031), ('pages', 0.031), ('ranking', 0.03), ('constraints', 0.029), ('heuristics', 0.029), ('stream', 0.029), ('yahoo', 0.029), ('updates', 0.029), ('que', 0.028), ('formidable', 0.028), ('ry', 0.028), ('whenever', 0.028), ('solutions', 0.028), ('cut', 0.027), ('bipartite', 0.026), ('lagrange', 0.026), ('billions', 0.026), ('assign', 0.025), ('reformulation', 0.025), ('np', 0.025), ('load', 0.024), ('traverse', 0.024), ('stored', 0.024), ('storage', 0.024), ('assignment', 0.024), ('webpages', 0.023), ('algorithmica', 0.023), ('sink', 0.023), ('optimization', 0.023), ('answer', 0.023), ('convex', 0.023), ('record', 0.023), ('lagrangian', 0.023), ('constraint', 0.023), ('path', 0.022), ('integral', 0.022), ('indexing', 0.022), ('monotonicity', 0.022), ('latency', 0.022), ('url', 0.022), ('subject', 0.022), ('million', 0.022), ('max', 0.021), ('ck', 0.02), ('service', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola

2 0.1594799 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma

Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1

3 0.093730316 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm deﬁned as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a speciﬁc hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network ﬂow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost ﬂow problem. We propose an efﬁcient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

4 0.085847653 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

Author: Silvia Chiappa, Jan R. Peters

Abstract: Many time-series such as human movement data consist of a sequence of basic actions, e.g., forehands and backhands in tennis. Automatically extracting and characterizing such actions is an important problem for a variety of different applications. In this paper, we present a probabilistic segmentation approach in which an observed time-series is modeled as a concatenation of segments corresponding to different basic actions. Each segment is generated through a noisy transformation of one of a few hidden trajectories representing different types of movement, with possible time re-scaling. We analyze three different approximation methods for dealing with model intractability, and demonstrate how the proposed approach can successfully segment table tennis movements recorded using a robot arm as haptic input device. 1

5 0.066584751 194 nips-2010-Online Learning for Latent Dirichlet Allocation

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by ﬁtting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA ﬁnds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

6 0.065901667 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

7 0.064575084 150 nips-2010-Learning concept graphs from text with stick-breaking priors

8 0.057343334 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

9 0.054842733 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

10 0.054586783 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

11 0.050179649 100 nips-2010-Gaussian Process Preference Elicitation

12 0.046316549 262 nips-2010-Switched Latent Force Models for Movement Segmentation

13 0.046271861 286 nips-2010-Word Features for Latent Dirichlet Allocation

14 0.045980752 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics

15 0.045786124 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

16 0.045358133 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

17 0.044547152 202 nips-2010-Parallelized Stochastic Gradient Descent

18 0.044506244 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

19 0.043977365 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

20 0.042280305 162 nips-2010-Link Discovery using Graph Feature Tracking

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.124), (1, 0.018), (2, 0.04), (3, 0.005), (4, -0.094), (5, -0.005), (6, 0.019), (7, -0.004), (8, 0.04), (9, -0.025), (10, 0.081), (11, -0.016), (12, -0.034), (13, 0.063), (14, -0.016), (15, -0.005), (16, -0.116), (17, -0.065), (18, -0.093), (19, 0.034), (20, 0.039), (21, 0.094), (22, -0.066), (23, 0.014), (24, -0.007), (25, -0.116), (26, -0.006), (27, 0.025), (28, 0.03), (29, 0.101), (30, 0.002), (31, -0.001), (32, 0.157), (33, -0.053), (34, 0.015), (35, 0.096), (36, 0.009), (37, 0.074), (38, 0.002), (39, 0.066), (40, -0.086), (41, 0.09), (42, -0.159), (43, -0.018), (44, -0.014), (45, -0.062), (46, -0.018), (47, -0.024), (48, 0.032), (49, 0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91890144 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola

2 0.61733633 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma

3 0.53898901 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

Author: Paolo Viappiani, Craig Boutilier

Abstract: Bayesian approaches to utility elicitation typically adopt (myopic) expected value of information (EVOI) as a natural criterion for selecting queries. However, EVOI-optimization is usually computationally prohibitive. In this paper, we examine EVOI optimization using choice queries, queries in which a user is ask to select her most preferred product from a set. We show that, under very general assumptions, the optimal choice query w.r.t. EVOI coincides with the optimal recommendation set, that is, a set maximizing the expected utility of the user selection. Since recommendation set optimization is a simpler, submodular problem, this can greatly reduce the complexity of both exact and approximate (greedy) computation of optimal choice queries. We also examine the case where user responses to choice queries are error-prone (using both constant and mixed multinomial logit noise models) and provide worst-case guarantees. Finally we present a local search technique for query optimization that works extremely well with large outcome spaces.

4 0.47949061 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

Author: Ronny Luss, Saharon Rosset, Moni Shahar

Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efﬁcient methods for each partitioning subproblem through an equivalent representation as a network ﬂow problem, and prove that this sequence of partitions converges to the global solution. These network ﬂow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.

5 0.47044453 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

6 0.41996926 100 nips-2010-Gaussian Process Preference Elicitation

7 0.41462359 150 nips-2010-Learning concept graphs from text with stick-breaking priors

8 0.40192884 9 nips-2010-A New Probabilistic Model for Rank Aggregation

9 0.39057285 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

10 0.35394788 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

11 0.34669691 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

12 0.34236953 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

13 0.32436064 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

14 0.32225165 60 nips-2010-Deterministic Single-Pass Algorithm for LDA

15 0.32123119 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

16 0.31974602 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics

17 0.3112942 194 nips-2010-Online Learning for Latent Dirichlet Allocation

18 0.30990127 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

19 0.30369496 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

20 0.30250004 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.037), (17, 0.036), (27, 0.081), (30, 0.051), (35, 0.016), (45, 0.18), (50, 0.048), (52, 0.025), (60, 0.037), (77, 0.047), (78, 0.037), (90, 0.02), (97, 0.286)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78094041 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

Author: Fabian L. Wauthier, Michael I. Jordan

Abstract: Heavy-tailed distributions are often used to enhance the robustness of regression and classiﬁcation methods to outliers in output space. Often, however, we are confronted with “outliers” in input space, which are isolated observations in sparsely populated regions. We show that heavy-tailed stochastic processes (which we construct from Gaussian processes via a copula), can be used to improve robustness of regression and classiﬁcation estimators to such outliers by selectively shrinking them more strongly in sparse regions than in dense regions. We carry out a theoretical analysis to show that selective shrinkage occurs when the marginals of the heavy-tailed process have sufﬁciently heavy tails. The analysis is complemented by experiments on biological data which indicate signiﬁcant improvements of estimates in sparse regions while producing competitive results in dense regions. 1

same-paper 2 0.75481719 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola

3 0.69013482 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models

Author: Ryan Kelly, Matthew Smith, Robert Kass, Tai S. Lee

Abstract: Activity of a neuron, even in the early sensory areas, is not simply a function of its local receptive ﬁeld or tuning properties, but depends on global context of the stimulus, as well as the neural context. This suggests the activity of the surrounding neurons and global brain states can exert considerable inﬂuence on the activity of a neuron. In this paper we implemented an L1 regularized point process model to assess the contribution of multiple factors to the ﬁring rate of many individual units recorded simultaneously from V1 with a 96-electrode “Utah” array. We found that the spikes of surrounding neurons indeed provide strong predictions of a neuron’s response, in addition to the neuron’s receptive ﬁeld transfer function. We also found that the same spikes could be accounted for with the local ﬁeld potentials, a surrogate measure of global network states. This work shows that accounting for network ﬂuctuations can improve estimates of single trial ﬁring rate and stimulus-response transfer functions. 1

4 0.64482272 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu

Abstract: Robust regression and classiﬁcation are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classiﬁcation problems. 1

5 0.60302383 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

Author: Surya Ganguli, Haim Sompolinsky

Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1

6 0.60063183 155 nips-2010-Learning the context of a category

7 0.60031497 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

8 0.59835541 268 nips-2010-The Neural Costs of Optimal Control

9 0.59730983 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

10 0.59729207 17 nips-2010-A biologically plausible network for the computation of orientation dominance

11 0.59720874 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

12 0.5970906 63 nips-2010-Distributed Dual Averaging In Networks

13 0.59550041 98 nips-2010-Functional form of motion priors in human motion perception

14 0.59533787 158 nips-2010-Learning via Gaussian Herding

15 0.59523177 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

16 0.5945487 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

17 0.59391278 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

18 0.59359485 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

19 0.59346616 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

20 0.59317887 194 nips-2010-Online Learning for Latent Dirichlet Allocation