nips nips2010 nips2010-289 knowledge-graph by maker-knowledge-mining

289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities


Source: pdf

Author: Ping Li, Arnd Konig, Wenhao Gui

Abstract: Computing1 two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. [sent-4, score-0.835]

2 While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). [sent-5, score-2.355]

3 We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. [sent-7, score-0.236]

4 Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. [sent-8, score-1.197]

5 1 Introduction The efficient computation of the similarity (or overlap) between sets is a central operation in a variety of applications, such as word associations (e. [sent-9, score-0.15]

6 Word associations (collocations, co-occurrences) If one inputs a query NIPS machine learning, all major search engines will report the number of pagehits (e. [sent-20, score-0.204]

7 Word associations have many other applications in Computational Linguistics [13, 38], and were recently used for Web search query reformulation and query suggestions [42, 12]. [sent-26, score-0.188]

8 , [29, 15]) use the number of documents the words in a search query occur in for different text corpora representing various verticals as features. [sent-34, score-0.103]

9 , [10]) that reliable duplicate-detection should be based on local properties of groups of duplicates, most current approaches base their decisions on pairwise similarities between items only. [sent-61, score-0.135]

10 Data mining A lot of work in data mining has focused on efficient candidate pruning in the context of pairwise associations (e. [sent-68, score-0.21]

11 , [14]), a number of such pruning techniques leverage minwise hashing to prune pairs of items, but in many contexts (e. [sent-70, score-0.973]

12 , association rules with more than 2 items) multi-way associations are relevant; here, pruning based on pairwise interactions may perform much less well than multi-way pruning. [sent-72, score-0.138]

13 1 Ultra-high dimensional data are often binary For duplicate detection in the context of Web crawling/search, each document can be represented as a set of w-shingles (w contiguous words); w = 5 or 7 in several studies [3, 4, 17]. [sent-74, score-0.148]

14 Limitations of random projections The method of random projections (including simhash) is limited to estimating pairwise similarities. [sent-83, score-0.152]

15 Random projections convert any data distributions to (zero-mean) multivariate normals, whose density functions are determined by the covariance matrix which contains only the pairwise information of the original data. [sent-84, score-0.103]

16 3 Prior work on b-Bit Minwise Hashing Instead of storing each hashed value using 64 bits as in prior studies, e. [sent-87, score-0.393]

17 [35] demonstrated that using b = 1 reduces the storage space at least by a factor of 21. [sent-90, score-0.118]

18 3 (for a given accuracy) compared to b = 64, if one is interested in resemblance ≥ 0. [sent-91, score-0.236]

19 Moreover, by choosing the value b of bits to be retained, it becomes possible to systematically adjust the degree to which the estimator is “tuned” towards higher similarities as well as the amount of hashing (random permutations) required. [sent-93, score-0.899]

20 Compared to the pairwise case, our new estimator is significantly different. [sent-96, score-0.146]

21 In fact, as we will show later, estimating 3-way resemblance requires b ≥ 2. [sent-97, score-0.282]

22 We define three 2-way resemblances (R12 , R13 , R23 ) and one 3-way resemblance (R) as: R12 = |S1 ∩ S2 | |S1 ∩ S3 | |S2 ∩ S3 | , R13 = , R23 = , |S1 ∪ S2 | |S1 ∪ S3 | |S2 ∪ S3 | R = R123 = |S1 ∩ S2 ∩ S3 | . [sent-107, score-0.378]

23 |S1 ∪ S2 ∪ S3 | (1) which, using our notation, can be expressed in various forms: aij sij = , i = j, fi + fj − aij ri + rj − sij a s s R= = = . [sent-108, score-0.536]

24 When the set sizes, fi = |Si |, can be assumed to be known, we can compute resemblances from intersections and vice versa: aij = Rij (fi + fj ), 1 + Rij a= R (f1 + f2 + f3 − a12 − a13 − a23 ) . [sent-110, score-0.346]

25 1−R Thus, estimating resemblances and estimating intersection sizes are two closely related problems. [sent-111, score-0.269]

26 5 Our Main Contributions • We derive the basic probability formula for estimating 3-way resemblance using b-bit hashing. [sent-113, score-0.335]

27 This basic probability formula naturally leads to a (complicated) estimator of resemblance. [sent-115, score-0.153]

28 , ri = fi ≈ D 0, but fi /fj = ri /rj may be still significant) to develop a much simplified estimator, which is desired in practical applications. [sent-118, score-0.574]

29 This assumption of fi /D → 0 significantly simplifies the estimator and frees us from having to know the cardinalities fi . [sent-119, score-0.35]

30 • We analyze the theoretical variance of the simplified estimator and compare it with the original minwise hashing method (using 64 bits). [sent-120, score-1.12]

31 Our theoretical analysis shows that bbit minwise hashing can normally achieve a 10 to 25-fold improvement in storage space (for a given estimator accuracy of the 3-way resemblance) when the set similarities are not extremely low (e. [sent-121, score-1.298]

32 These results are particularly important for applications in which only detecting high resemblance/overlap is relevant, such as many data cleaning scenarios or duplicate detection. [sent-125, score-0.185]

33 The recommended procedure for estimating 3-way resemblances (in sparse data) is shown as Alg. [sent-126, score-0.188]

34 Algorithm 1 The b-bit minwise hashing algorithm, applied to estimating 3-way resemblances in a collection of N sets. [sent-128, score-1.137]

35 2) For each set Sn and permutation πj , store the lowest b bits of min (πj (Sn )), denoted by en,t,πj , t = 1 to b. [sent-137, score-0.41]

36 ˆ 4) If needed, the 2-way resemblances Rij,b can be estimated as Rij,b = ˆ 2b Pij,b −1 . [sent-143, score-0.142]

37 2b −1 2 The Precise Theoretical Probability Analysis Minwise hashing applies k random permutations πj : Ω −→ Ω, Ω = {0, 1, . [sent-144, score-0.43]

38 |S1 ∪ S2 | (4) This method naturally extends to estimating 3-way resemblances for three sets S1 , S2 , S3 ∈ Ω: Pr (min(πj (S1 )) = min(πj (S2 )) = min(πj (S3 ))) = |S1 ∩ S2 ∩ S3 | = R. [sent-148, score-0.188]

39 |S1 ∪ S2 ∪ S3 | (5) To describe b-bit hashing, we define the minimum values under π and their lowest b bits to be: zi = min (π (Si )) , ei,t = t-th lowest bit of zi . [sent-149, score-0.412]

40 46 b = 4 0 100 2 bits 3 bits 4 bits Theoretical 200 300 400 Sample size k 500 0. [sent-171, score-1.038]

41 The empirical estimates and the theoretical predictions essentially overlap regardless of the sparsity measure ri = fi /D. [sent-173, score-0.397]

42 To obtain a simpler formula, we leverage the observation that in practice we often have ri = fi ≈ 0, D even though both fi and D can be very large. [sent-193, score-0.412]

43 Here, D = 264 , which means that even for a web page with fi = 254 shingles (corresponding to the r2 text of a small novel), we still have fi ≈ 0. [sent-195, score-0.37]

44 Recall the resemblances (2) and (3) are only determined by these ratios. [sent-200, score-0.142]

45 We analyzed the distribution of fi using two real-life datasets: the UCI dataset containing 3 × 105 D NYTimes articles; and a Microsoft proprietary dataset with 106 news articles [19]. [sent-201, score-0.244]

46 Table 1 reports the summary statistics of the fi values. [sent-204, score-0.125]

47 D Table 1: Summary statistics of the Data 3 × 105 UCI-NYTimes articles 106 Microsoft articles (1-shingle) 106 Microsoft articles (2-shingle) 106 Microsoft articles (3-shingle) fi D values in two datasets Median 0. [sent-205, score-0.337]

48 , no information about the 3-way resemblance R is 4 contained. [sent-226, score-0.236]

49 Assuming r1 , r2 , r3 → 0, the proposed estimator of R, denoted by Rb , is ˆ Rb = ˆ ˆ ˆ ˆ 4b Pb − 2b P12,b + P13,b + P23,b + 2 (2b − 1)(2b − 2) . [sent-230, score-0.1]

50 V ar Rb = k (2b − 1)(2b − 2) (9) It is interesting to examine several special cases: ˆ • b = 1: V ar(R1 ) = ∞, i. [sent-233, score-0.137]

51 RM is the original minwise hashing estiˆ mator for 3-way resemblance. [sent-238, score-0.976]

52 In principle, the estimator RM requires an infinite precision ˆ ˆ (i. [sent-239, score-0.134]

53 4 presents the empirical mean square 2 ˆ errors (MSE = bias +variance) together with the theoretical variances V ar(Rb ) in Theorem 3. [sent-249, score-0.146]

54 We conducted experiments using b = 2, 3, and 4 as well as the original minwise hashing (denoted by “M”). [sent-259, score-0.976]

55 The plots verify that as ri decreases (to zero), the biases vanish. [sent-260, score-0.19]

56 Note that the set sizes fi remain the same, but the relative values ri = fi decrease as D increases. [sent-261, score-0.412]

57 The solid curves are the empirical MSEs (=var+bias2 ) and the dashed lines are the theoretical variances (9), under the assumption of ri → 0. [sent-263, score-0.336]

58 When D = 220 and D = 222 , even though the ri values are not too small, the solid and dashed lines almost overlap. [sent-265, score-0.216]

59 Note that, at the same sample size k, we always ˆ ˆ ˆ ˆ ˆ have V ar(R2 ) > V ar(R3 ) > V ar(R4 ) > V ar(RM ), where RM is the original minwise hashing ˆ 3 ) and V ar(R4 ) are very close to V ar(RM ). [sent-266, score-1.015]

60 4 as follows: • When the ri = fi values are large (e. [sent-270, score-0.287]

61 The estimation biases diminish as the ri values decrease. [sent-274, score-0.19]

62 In fact, even when the ri values are not small (e. [sent-275, score-0.162]

63 • The variance formula (9) becomes accurate when the ri values are not too large. [sent-280, score-0.215]

64 1), the empirical MSEs largely overlap the theoretical variances which assumed ri → 0, unless the sample size k is large. [sent-282, score-0.304]

65 , 264 ) and the ri values (fi /D) will be very small, our proposed simple estimator (8) will be very useful in practice, because it becomes unbiased and the variance can be reliably predicted by (9). [sent-286, score-0.262]

66 4 Improving Estimates for Dense Data Using Theorem 1 While we believe the simple estimator in (8) and Alg. [sent-287, score-0.1]

67 1 should suffice in most applications, we demonstrate here that the sparsity assumption of ri → 0 is not essential if one is willing to use the more sophisticated estimation procedure provided by Theorem 1. [sent-288, score-0.162]

68 (6), s = Pb u − Z, where Z contains s, sij , ri etc. [sent-290, score-0.235]

69 We first estimate sij (from the estimated Rij ) using the precise formula for the two-way case; see Appendix A. [sent-291, score-0.162]

70 We then iteratively solve for ˆ s using the initial guess provided by the estimator Rb in (8). [sent-292, score-0.1]

71 5, the solid curves are obtained using the precise estimation procedure by Theorem 1. [sent-299, score-0.105]

72 ˆ The dashed curves are the estimates using the simplified estimator Rb which assumes ri → 0. [sent-300, score-0.373]

73 Even when the data are not sparse, the precise estimation procedure provides unbiased estimates as verified by the leftmost panel of Fig. [sent-301, score-0.109]

74 Using the precise procedure results in noticeably more accurate estimates in non-sparse data, as verified by the second panel of Fig. [sent-303, score-0.109]

75 However, as long as ˆ the data are reasonably sparse (the right two panels), the simple estimator Rb in (8) is accurate. [sent-305, score-0.1]

76 The dashed curves correspond to the estimates using the simplified ˆ estimator Rb in (8) which assumes ri → 0. [sent-311, score-0.373]

77 5 Quantifying the Improvements Using b-Bit Hashing This section is devoted to analyzing the improvements of b-bit minwise hashing, compared to using 64 bits for each hashed value. [sent-312, score-0.977]

78 The original minwise hashing stores each “sample” using 64 bits (as in [17]). [sent-314, score-1.322]

79 For ˆ ˆ b-bit minwise hashing, we store each “sample” using b bits only. [sent-315, score-0.93]

80 Note that V ar(R64 ) and V ar(RM ) (the variance of the original minwise hashing) are numerically indistinguishable. [sent-316, score-0.58]

81 This variance-space trade-off can be quantified by ˆ B(b) = b × Var Rb × k, which is called the storage factor. [sent-318, score-0.118]

82 The ratio B(64) B(b) precisely characterizes the improvements of b-bit hashing compared to using 64 bits. [sent-320, score-0.427]

83 6 confirms the substantial improvements of b-bit hashing over the original minwise hashing using 64 bits. [sent-348, score-1.403]

84 The improvements in terms of the storage space are usually 10 (or 15) to 25-fold when the sets are reasonably similar (i. [sent-349, score-0.149]

85 06 R the relative storage improvement of using b = 2, 3, 4, 6 bits, compared to using 64 Figure 6: bits. [sent-362, score-0.118]

86 6 Evaluation of Accuracy We conducted a duplicate detection experiment on a public (UCI) collection of 300,000 NYTimes news articles. [sent-367, score-0.18]

87 The task is to identify 3-groups with 3-way resemblance R exceeding a threshold R0 . [sent-368, score-0.236]

88 We experimented with b = 2, 4 and the original minwise hashing. [sent-370, score-0.58]

89 These curves confirm the significant improvement of using b-bit minwise hashing when the threshold R0 is quite high (e. [sent-374, score-0.992]

90 3, using b = 4 resulted in similar precisions as using the original minwise hashing (i. [sent-379, score-0.976]

91 1, using b = 4 can still achieve similar precisions as using the original minwise hashing by only slightly increasing the sample size k. [sent-383, score-1.015]

92 3 4 M 0 0 100 200 300 400 Sample size k 0 0 500 100 200 300 400 Sample size k 0 0 500 100 200 300 400 Sample size k 500 Figure 7: Precision curves on the UCI collection of news data. [sent-399, score-0.109]

93 The task is to retrieve news article 3-groups with resemblance R ≥ R0 . [sent-400, score-0.302]

94 8, 2-bit hashing and 4-bit hashing require about k = 500 samples and k = 260 samples respectively, while the original minwise hashing (denoted by M ) requires about 170 samples. [sent-404, score-1.768]

95 This study is devoted to simultaneously estimating 2-way and 3-way similarities using b-bit minwise hashing. [sent-407, score-0.656]

96 Compared to the prior work on estimating 2-way resemblance [35], the extension to 3-way is important for many application scenarios (as described in Sec. [sent-408, score-0.314]

97 For estimating 3-way resemblance, our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy, when the set similarities are not extremely low (e. [sent-410, score-1.3]

98 For many practical applications, the reductions in storage directly translate to improvements in processing speed as well, especially when memory latency is the main bottleneck, which, with the advent of many-core processors, is more and more common. [sent-415, score-0.149]

99 CRS is also provably more accurate than minwise hashing for binary data. [sent-417, score-0.949]

100 A scalable pattern mining approach to web graph compression with communities. [sent-456, score-0.124]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('minwise', 0.553), ('hashing', 0.396), ('bits', 0.346), ('resemblance', 0.236), ('ri', 0.162), ('pb', 0.16), ('rb', 0.15), ('resemblances', 0.142), ('ar', 0.137), ('fi', 0.125), ('storage', 0.118), ('mse', 0.103), ('estimator', 0.1), ('duplicate', 0.089), ('web', 0.088), ('wsdm', 0.079), ('sij', 0.073), ('associations', 0.068), ('news', 0.066), ('cleaning', 0.064), ('www', 0.06), ('similarities', 0.057), ('intersections', 0.054), ('articles', 0.053), ('rj', 0.053), ('formula', 0.053), ('word', 0.05), ('query', 0.048), ('crs', 0.047), ('hashed', 0.047), ('mses', 0.047), ('simhash', 0.047), ('pairwise', 0.046), ('estimating', 0.046), ('rij', 0.045), ('theoretical', 0.044), ('curves', 0.043), ('rm', 0.042), ('join', 0.042), ('pr', 0.041), ('estimates', 0.04), ('bias', 0.04), ('sample', 0.039), ('church', 0.038), ('microsoft', 0.037), ('mining', 0.036), ('precise', 0.036), ('spain', 0.036), ('simpli', 0.035), ('intersection', 0.035), ('document', 0.034), ('precision', 0.034), ('permutations', 0.034), ('lowest', 0.033), ('panel', 0.033), ('variances', 0.033), ('engines', 0.032), ('similarity', 0.032), ('scenarios', 0.032), ('items', 0.032), ('barcelona', 0.032), ('eshghi', 0.032), ('gamon', 0.032), ('ganti', 0.032), ('gollapudi', 0.032), ('ids', 0.032), ('lexicography', 0.032), ('manasse', 0.032), ('najork', 0.032), ('nig', 0.032), ('nytimes', 0.032), ('pagehits', 0.032), ('shingles', 0.032), ('store', 0.031), ('corpora', 0.031), ('improvements', 0.031), ('projections', 0.03), ('normally', 0.03), ('square', 0.029), ('dashed', 0.028), ('biases', 0.028), ('gionis', 0.028), ('joins', 0.028), ('icde', 0.028), ('phrase', 0.028), ('ullman', 0.028), ('theorem', 0.027), ('original', 0.027), ('verifying', 0.027), ('canada', 0.026), ('overlap', 0.026), ('solid', 0.026), ('duplicates', 0.025), ('bc', 0.025), ('cornell', 0.025), ('detection', 0.025), ('aij', 0.025), ('pruning', 0.024), ('linguistics', 0.024), ('search', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities

Author: Ping Li, Arnd Konig, Wenhao Gui

Abstract: Computing1 two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. 1

2 0.11811676 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1

3 0.080765277 117 nips-2010-Identifying graph-structured activation patterns in networks

Author: James Sharpnack, Aarti Singh

Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1

4 0.060019027 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

5 0.058178153 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan

Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.

6 0.056441393 205 nips-2010-Permutation Complexity Bound on Out-Sample Error

7 0.056093119 177 nips-2010-Multitask Learning without Label Correspondences

8 0.055382695 265 nips-2010-The LASSO risk: asymptotic results and real world examples

9 0.053424042 148 nips-2010-Learning Networks of Stochastic Differential Equations

10 0.05050835 152 nips-2010-Learning from Logged Implicit Exploration Data

11 0.049753409 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

12 0.044967625 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

13 0.044777371 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

14 0.042966917 202 nips-2010-Parallelized Stochastic Gradient Descent

15 0.041394584 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

16 0.040768031 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

17 0.040513419 22 nips-2010-Active Estimation of F-Measures

18 0.039718118 101 nips-2010-Gaussian sampling by local perturbations

19 0.039481983 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

20 0.038361188 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.121), (1, 0.021), (2, 0.037), (3, 0.021), (4, -0.048), (5, 0.004), (6, 0.013), (7, -0.019), (8, 0.01), (9, -0.038), (10, 0.014), (11, -0.014), (12, -0.073), (13, -0.011), (14, 0.025), (15, -0.035), (16, 0.013), (17, -0.015), (18, -0.05), (19, 0.044), (20, 0.004), (21, 0.02), (22, -0.031), (23, 0.018), (24, 0.075), (25, 0.021), (26, 0.072), (27, -0.055), (28, 0.026), (29, -0.048), (30, -0.013), (31, 0.009), (32, -0.057), (33, -0.085), (34, -0.071), (35, 0.004), (36, 0.037), (37, 0.017), (38, 0.006), (39, -0.006), (40, -0.061), (41, 0.007), (42, -0.031), (43, 0.033), (44, 0.123), (45, -0.027), (46, -0.043), (47, 0.045), (48, -0.033), (49, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90634108 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities

Author: Ping Li, Arnd Konig, Wenhao Gui

Abstract: Computing1 two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. 1

2 0.55530542 107 nips-2010-Global seismic monitoring as probabilistic inference

Author: Nimar Arora, Stuart Russell, Paul Kidwell, Erik B. Sudderth

Abstract: The International Monitoring System (IMS) is a global network of sensors whose purpose is to identify potential violations of the Comprehensive Nuclear-Test-Ban Treaty (CTBT), primarily through detection and localization of seismic events. We report on the first stage of a project to improve on the current automated software system with a Bayesian inference system that computes the most likely global event history given the record of local sensor data. The new system, VISA (Vertically Integrated Seismological Analysis), is based on empirically calibrated, generative models of event occurrence, signal propagation, and signal detection. VISA exhibits significantly improved precision and recall compared to the current operational system and is able to detect events that are missed even by the human analysts who post-process the IMS output. 1

3 0.52281874 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

4 0.50176716 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance

Author: Ulrike V. Luxburg, Agnes Radl, Matthias Hein

Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. 1

5 0.50062329 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

6 0.49616763 117 nips-2010-Identifying graph-structured activation patterns in networks

7 0.46406606 148 nips-2010-Learning Networks of Stochastic Differential Equations

8 0.46160817 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

9 0.46090069 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

10 0.46063653 22 nips-2010-Active Estimation of F-Measures

11 0.44412452 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

12 0.42414957 202 nips-2010-Parallelized Stochastic Gradient Descent

13 0.4128662 190 nips-2010-On the Convexity of Latent Social Network Inference

14 0.40960759 265 nips-2010-The LASSO risk: asymptotic results and real world examples

15 0.40770647 214 nips-2010-Probabilistic Belief Revision with Structural Constraints

16 0.40560806 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

17 0.40557322 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction

18 0.39948034 9 nips-2010-A New Probabilistic Model for Rank Aggregation

19 0.39254645 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks

20 0.38753703 142 nips-2010-Learning Bounds for Importance Weighting


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.037), (17, 0.011), (27, 0.063), (30, 0.051), (35, 0.024), (45, 0.181), (50, 0.048), (52, 0.039), (54, 0.301), (60, 0.019), (77, 0.067), (78, 0.023), (90, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79282069 111 nips-2010-Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model

Author: Peggy Series, David P. Reichert, Amos J. Storkey

Abstract: The Charles Bonnet Syndrome (CBS) is characterized by complex vivid visual hallucinations in people with, primarily, eye diseases and no other neurological pathology. We present a Deep Boltzmann Machine model of CBS, exploring two core hypotheses: First, that the visual cortex learns a generative or predictive model of sensory input, thus explaining its capability to generate internal imagery. And second, that homeostatic mechanisms stabilize neuronal activity levels, leading to hallucinations being formed when input is lacking. We reproduce a variety of qualitative findings in CBS. We also introduce a modification to the DBM that allows us to model a possible role of acetylcholine in CBS as mediating the balance of feed-forward and feed-back processing. Our model might provide new insights into CBS and also demonstrates that generative frameworks are promising as hypothetical models of cortical learning and perception. 1

2 0.74743533 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

Author: Seunghak Lee, Jun Zhu, Eric P. Xing

Abstract: To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.

same-paper 3 0.73431784 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities

Author: Ping Li, Arnd Konig, Wenhao Gui

Abstract: Computing1 two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that b-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance. 1

4 0.59499961 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

Author: Surya Ganguli, Haim Sompolinsky

Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1

5 0.59108829 117 nips-2010-Identifying graph-structured activation patterns in networks

Author: James Sharpnack, Aarti Singh

Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1

6 0.58604246 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

7 0.58599758 282 nips-2010-Variable margin losses for classifier design

8 0.58398527 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data

9 0.58377981 63 nips-2010-Distributed Dual Averaging In Networks

10 0.58346796 158 nips-2010-Learning via Gaussian Herding

11 0.58325738 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

12 0.58318824 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

13 0.58315396 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

14 0.58305329 148 nips-2010-Learning Networks of Stochastic Differential Equations

15 0.58178157 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

16 0.58132046 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

17 0.58099794 155 nips-2010-Learning the context of a category

18 0.58042747 265 nips-2010-The LASSO risk: asymptotic results and real world examples

19 0.57986492 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

20 0.57892478 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models