nips nips2011 nips2011-110 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we study the group anomaly detection problem. [sent-8, score-0.587]
2 Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. [sent-9, score-0.679]
3 FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. [sent-11, score-0.613]
4 We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. [sent-12, score-0.415]
5 In this paper, however, we are interested in finding group anomalies, where a set of points together exhibit unusual behavior. [sent-17, score-0.319]
6 A point-based group anomaly is a group of individually anomalous points. [sent-21, score-0.901]
7 A distribution-based anomaly is a group where the points are relatively normal, but as a whole they are unusual. [sent-22, score-0.579]
8 Most existing work on group anomaly detection focuses on pointbased anomalies. [sent-23, score-0.61]
9 A common way to detect point-based anomalies is to first identify anomalous points and then find their aggregations using scanning or segmentation methods [2, 3, 4]. [sent-24, score-0.569]
10 Our contribution is to propose a new method (FGM) for detecting both types of group anomalies in an integral way. [sent-28, score-0.572]
11 The most famous class of mixture models for modeling group data is the family of topic models [9, 10]. [sent-43, score-0.494]
12 In topic models, distributions of points in different groups are mixtures of components (“topics”), which are shared among all the groups. [sent-44, score-0.48]
13 Our proposed method is closely related to the class of topic models, but it is designed specifically for the purpose of detecting group anomalies. [sent-45, score-0.515]
14 At the group level, a flexible structure based on “genres” is used to characterize the topic distributions so that complex normal behaviors are allowed and can be recognized. [sent-47, score-0.616]
15 At the point level, each group has its own topics to accommodate and capture the variations of points’ distributions (while global topic information is still shared among groups). [sent-48, score-0.735]
16 Given a group of points, we can examine whether or not it conforms to the normal behavior defined by the learned genres and topics. [sent-50, score-0.381]
17 We will also propose scoring functions that can detect both point-based and distribution-based group anomalies. [sent-51, score-0.308]
18 In Section 2 we review related work and discuss the limitations with existing algorithms and why a new method is needed for group anomaly detection. [sent-58, score-0.525]
19 Section 5 describes how to use our method for group anomaly detection. [sent-61, score-0.525]
20 2 Background and Related Work In this section, we provide background about topic models and explain the limitation of existing methods in detecting group anomalies. [sent-64, score-0.515]
21 For intuition, we introduce the problem in the context of detecting anomalous images, rare galaxy clusters, and unusual motion in a dynamic fluid simulation. [sent-65, score-0.34]
22 It represents the distribution of points (words) in a group (document) as a mixture of K global topics 1 , . [sent-82, score-0.528]
23 LDA generates the mth group by first drawing its topic + distribution ✓m from the prior distribution Dir(↵). [sent-89, score-0.515]
24 Then for each point xmn in the mth group it draws one of the K topics from M(✓m ) (i. [sent-90, score-0.526]
25 , zmn ⇠ M(✓m )) and then generates the point according to this topic (xmn ⇠ M( zmn )). [sent-92, score-0.433]
26 At a higher level, a group is characterized by the distribution of topics ✓m , i. [sent-101, score-0.44]
27 The concepts of topic and topic distribution help us define group anomalies: a point-based anomaly contains points that do not belong to any of the normal topics and a distribution-based anomaly has a topic distribution ✓m that is uncommon. [sent-104, score-1.926]
28 Although topic models are very useful in estimating the topics and topic distributions in groups, the existing methods are incapable of detecting group anomalies comprehensively. [sent-105, score-1.317]
29 For example, it should be able to model complex and multi-modal distributions of the topic distribution ✓. [sent-107, score-0.309]
30 LDA, however, only uses a single Dirichlet distribution to generate topic distributions, and cannot effectively define what normal and abnormal distributions should be. [sent-108, score-0.388]
31 It also uses the same K topics for every group, which makes groups indifferentiable when looking at their topics. [sent-109, score-0.322]
32 In addition, these shared topics are not adapted to each group either. [sent-110, score-0.443]
33 The Mixture of Gaussian Mixture Model (MGMM) [13] firstly uses topic modeling for group anomaly detection. [sent-111, score-0.77]
34 It allows groups to select their topic distributions from a dictionary of multinomials, which is learned from data to define what is normal. [sent-112, score-0.406]
35 [14] employed the same idea but did not apply their model to anomaly detection. [sent-113, score-0.31]
36 The Theme Model (ThM) [15] lets a mixture of Dirichlets generate the topic distributions and then uses the memberships in this mixture to do clustering on groups. [sent-115, score-0.398]
37 In contrast, [16] proposed to use different topics for different groups in order to account for the burstiness of the words (points). [sent-118, score-0.343]
38 These adaptive topics are useful in recognizing point-level anomalies, but cannot be used to detect anomalous behavior at the group level. [sent-119, score-0.636]
39 For the group anomaly detection problem we propose a new method, the Flexible Genre Model, and demonstrate that it is able to cope with the issues mentioned above and performs better than the existing state-of-the-art algorithms. [sent-120, score-0.587]
40 3 Model Specification The flexible genre model (FGM) extends LDA such that the generating processes of topics and topic distributions can model more complex distributions. [sent-121, score-0.642]
41 (i) To model the behavior of topic distributions, we use several “genres”, each of which is a typical distribution of topic distributions. [sent-123, score-0.507]
42 • Draw a topic distribution according to the genre ym : SK 3 ✓m ⇠ Dir(↵ym ). [sent-132, score-0.515]
43 Each genre is a Dirichlet distribution for generating the topic distributions, and ↵ = {↵t }t=1,. [sent-145, score-0.404]
44 Having the topic distribution ✓m and the topics { m,k }, points are generated as in LDA. [sent-157, score-0.524]
45 Thus, the topics can be adapted to local group data, but the information is still shared globally. [sent-161, score-0.443]
46 Moreover, the topic generators P (·|⌘) determine how the topics { m,k } should look like. [sent-162, score-0.503]
47 In turn, if a group uses unusual topics to generate its points, it can be identified. [sent-163, score-0.492]
48 For computational convenience, the topic generators are Gaussian-Inverse-Wishart (GIW) distributions, which are conjugate to the Gaussian topics. [sent-167, score-0.295]
49 We can write the complete likelihood of data and latent variables in group Gm under FGM as follows: P (Gm , ym , ✓m , m |⇥) = M(ym |⇡)Dir(✓m |↵ym ) Y k GIW ( m,k |⌘k ) Y n M(zmn |✓m )N (xmn | m,zmn ). [sent-170, score-0.376]
50 The inferred latent states—including the topic distributions ✓m , the topics m , and the topic and genre memberships zm , ym —can be used for detecting anomalies and exploring the data. [sent-174, score-1.435]
51 For the topic distribution ✓m : P (✓m | ⇠) / P (zm |✓m )P (✓m |↵, ym ) = M(zm |✓m )Dir(✓m |↵ym ) = Dir(↵ym + nm ), where nm denotes the histogram of the K values in vector zm . [sent-183, score-0.471]
52 For m,k , the kth topic in group m, one can find that: P( m,k | ⇠) / P (x(k) | m m,k )P ( m,k |⌘k ) = N (x(k) | m 4 m,k )GIW ( m,k |⌘k ) = GIW ( 0 m,k |⌘k ), (k) where xm are points in group Gm from topic k, i. [sent-185, score-1.011]
53 For zmn , the topic membership of point n in group m is as follows: P (zmn = k| ⇠) / P (xmn |zmn = k, 4. [sent-190, score-0.572]
54 ,T captures one typical distribution of topic distributions as ✓ ⇠ Dir(↵t ). [sent-196, score-0.309]
55 , the topic distributions having genre t), which can be solved using the Newton–Raphson method [18]. [sent-209, score-0.434]
56 5 Scoring Criteria The learned FGM model can easily be used for anomaly detection on test data. [sent-218, score-0.372]
57 Given a test group, we first infer its latent variables including the topics and the topic distribution. [sent-219, score-0.484]
58 Point-based group anomalies can be detected by examining the topics of the groups. [sent-221, score-0.725]
59 If a group contains anomalous points with rare feature values xmn , then the topics { m,k }K that generk=1 ate these points will deviate from the normal behavior defined by the topic generators ⌘. [sent-222, score-1.146]
60 The point-based anomaly score (PB score) of group Gm is Z E m [ ln P ( m |⇥)] = P ( m |⇥, Gm ) ln P ( m |⇥)d m . [sent-224, score-0.587]
61 Distribution-based group anomalies can be detected by examining the topic distributions. [sent-226, score-0.762]
62 The distribution-based anomaly score (DB score) of group Gm is defined as Z E✓m [ ln P (✓m |⇥)] = P (✓m |⇥, Gm ) ln P (✓m |⇥)d✓m . [sent-233, score-0.587]
63 We show that FGM outperforms several sate-of-the-art competitors in the group anomaly detection task. [sent-237, score-0.587]
64 To handle continuous data and detect anomalies, we modified it by using Gaussian topics and applied the distribution-based anomaly scoring function (1). [sent-241, score-0.611]
65 Thus, a group is normal if its points are sampled from these three Gaussians, and their mixing weights are close to either w1 or w2 . [sent-257, score-0.343]
66 Point-based anomalies were groups of points sampled from N ((0, 0), I2 ). [sent-259, score-0.47]
67 Distributionbased anomalies were generated by GMMs consisting of normal Gaussian components but with mixing weights [0. [sent-260, score-0.376]
68 One point-based anomalous group and two distribution-based anomalous groups were injected into the data set. [sent-268, score-0.668]
69 Normal groups are surrounded by black solid boxes, point-based anomalies have green dashed boxes, and distribution-based anomalies have red/magenta dashed boxes. [sent-272, score-0.718]
70 Points are colored by the anomaly scores of the groups (darker color means more anomalous). [sent-273, score-0.453]
71 models can find the distribution-based anomalies since they are able to learn the topic distributions. [sent-276, score-0.547]
72 The explanation is simple; the anomalous points are distributed in the middle of the topics, thus the inferred topic distribution is around [0. [sent-278, score-0.477]
73 This example shows one possible problem of scoring groups based on topic distributions only. [sent-283, score-0.447]
74 On the contrary, using the sum of point-based and distribution-based scores, FGM found all of the group anomalies thanks to its ability to characterize groups both at the point-level and the group-level. [sent-284, score-0.648]
75 Figure 3(d) shows the learned 6 PT genres visualized as the distribution t=1 ⇡t Dir(·|↵t ) on the topic simplex. [sent-290, score-0.368]
76 This distribution summarizes the normal topic distributions in this data set. [sent-291, score-0.369]
77 (a) (b) (c) (d) Figure 3: (a),(b),(c) show the density of the point-based anomaly estimated by MGMM, ThM, and FGM respectively. [sent-293, score-0.31]
78 We created anomalies by stitching random normal test images from different categories. [sent-301, score-0.407]
79 For example, an anomaly may be a picture that is half mountain and half city street. [sent-302, score-0.31]
80 These anomalies are challenging since they have the same local patches as the normal images. [sent-303, score-0.381]
81 We mixed the anomalies with normal test images and asked the detectors to find them. [sent-304, score-0.405]
82 The first one, called LDA-KNN, uses LDA to estimate the topic distributions of the groups and treats these topic distributions (vector parameters of multinomials) as the groups’ features. [sent-314, score-0.698]
83 For all algorithms we used K = 8 topics and T = 6 genres as it was suggested by BIC searches. [sent-321, score-0.314]
84 The performance is measured by the area under the ROC curve (AUC) of retrieving the anomalies from the test set. [sent-323, score-0.302]
85 GMM cannot detect the group anomalies that do not have anomalous points. [sent-326, score-0.73]
86 3 Turbulence Data We present an explorative study of detecting group anomalies on turbulence data from the JHU Turbulence Database Cluster2 (TDC) [22]. [sent-334, score-0.666]
87 35 P (a) Sample images and stitched anomalies LDA−KNN MGMM ThM FGM−DB DD (b) Detection performance Figure 4: Detection of stitched images. [sent-348, score-0.382]
88 We applied MGMM, ThM, and FGM to find anomalies in this group data. [sent-360, score-0.517]
89 T = 4 genres and K = 6 topics were used for all methods. [sent-361, score-0.314]
90 We do not have a groundtruth for anomalies in this data set. [sent-362, score-0.302]
91 This vorticity can be considered as a hand crafted anomaly score based on expert knowledge of this fluid data. [sent-365, score-0.426]
92 We do not want an anomaly detector to match this score perfectly because there are other “non-vortex” anomalous events it should find as well. [sent-366, score-0.538]
93 However, we do think higher correlation with this score indicates better anomaly detection performance. [sent-367, score-0.406]
94 Figure 5 visualizes the anomaly scores of FGM and the vorticity. [sent-368, score-0.339]
95 (a) & (b) FGM-DB anomaly score and vorticity visualized on one slice of the cube. [sent-380, score-0.426]
96 (c) Correlations of the anomaly scores with the vorticity. [sent-381, score-0.339]
97 7 Conclusion We presented the generative Flexible Genre Model (FGM) for the group anomaly detection problem. [sent-382, score-0.587]
98 Compared to traditional topic models, FGM is able to characterize groups’ behaviors at multiple levels. [sent-383, score-0.294]
99 Multivariate gaussian MRF for multispectral scene segmentation and anomaly detection. [sent-394, score-0.325]
100 Hierarchical probabilistic models for group a o anomaly detection. [sent-439, score-0.525]
wordName wordTfidf (topN-words)
[('fgm', 0.585), ('anomaly', 0.31), ('anomalies', 0.302), ('topic', 0.245), ('group', 0.215), ('topics', 0.208), ('thm', 0.206), ('mgmm', 0.175), ('anomalous', 0.161), ('genre', 0.142), ('groups', 0.114), ('ym', 0.111), ('genres', 0.106), ('gm', 0.101), ('turbulence', 0.094), ('zmn', 0.094), ('giw', 0.092), ('dir', 0.086), ('xmn', 0.082), ('uid', 0.082), ('vorticity', 0.082), ('detection', 0.062), ('normal', 0.06), ('flexible', 0.057), ('detecting', 0.055), ('points', 0.054), ('detect', 0.052), ('lda', 0.052), ('gmms', 0.05), ('generators', 0.05), ('unusual', 0.05), ('gmm', 0.048), ('distributions', 0.047), ('dirichlet', 0.043), ('scoring', 0.041), ('gibbs', 0.04), ('velocity', 0.039), ('multinomials', 0.038), ('jeff', 0.035), ('galaxies', 0.035), ('galaxy', 0.035), ('tdc', 0.035), ('score', 0.034), ('nm', 0.034), ('mixture', 0.034), ('detector', 0.033), ('boxes', 0.032), ('behaviors', 0.032), ('mle', 0.031), ('latent', 0.031), ('dd', 0.03), ('zm', 0.03), ('scores', 0.029), ('schneider', 0.027), ('divergences', 0.027), ('stitched', 0.027), ('images', 0.026), ('synthetic', 0.025), ('knn', 0.025), ('carlo', 0.024), ('barnab', 0.023), ('czos', 0.023), ('kaustav', 0.023), ('pointbased', 0.023), ('monte', 0.023), ('estimations', 0.022), ('motion', 0.022), ('xm', 0.022), ('mth', 0.021), ('burstiness', 0.021), ('xiong', 0.021), ('shared', 0.02), ('db', 0.02), ('likelihood', 0.019), ('generate', 0.019), ('bic', 0.019), ('memberships', 0.019), ('sf', 0.019), ('stitching', 0.019), ('theme', 0.019), ('patches', 0.019), ('membership', 0.018), ('distribution', 0.017), ('grid', 0.017), ('rare', 0.017), ('injected', 0.017), ('detectors', 0.017), ('characterize', 0.017), ('donald', 0.016), ('scene', 0.015), ('clusters', 0.015), ('kth', 0.015), ('mellon', 0.015), ('mining', 0.015), ('cube', 0.015), ('carnegie', 0.015), ('das', 0.014), ('particles', 0.014), ('mixing', 0.014), ('ln', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
2 0.25352332 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs
Author: Kumar Sricharan, Alfred O. Hero
Abstract: Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. The K-kNNG detector, while possessing several desirable characteristics, suffers from high computation complexity, and in [4] a simpler heuristic approximation, the leave-one-out kNNG (L1O-kNNG) was proposed. In this paper, we propose a novel bipartite k-nearest neighbor graph (BPkNNG) anomaly detection scheme for estimating minimum volume sets. Our bipartite estimator retains all the desirable theoretical properties of the K-kNNG, while being computationally simpler than the K-kNNG and the surrogate L1OkNNG detectors. We show that BP-kNNG is asymptotically consistent in recovering the p-value of each test point. Experimental results are given that illustrate the superior performance of BP-kNNG as compared to the L1O-kNNG and other state of the art anomaly detection schemes.
3 0.24891025 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
4 0.21079685 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
5 0.1738898 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
6 0.12924941 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
7 0.075671531 156 nips-2011-Learning to Learn with Compound HD Models
8 0.074762627 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
9 0.062306568 78 nips-2011-Efficient Methods for Overlapping Group Lasso
10 0.050820414 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
11 0.048512924 258 nips-2011-Sparse Bayesian Multi-Task Learning
12 0.048483033 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
13 0.038417425 241 nips-2011-Scalable Training of Mixture Models via Coresets
14 0.036071498 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval
15 0.03502322 14 nips-2011-A concave regularization technique for sparse mixture models
16 0.033068225 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
17 0.032715768 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
18 0.032303251 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
19 0.032151345 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
20 0.031521201 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
topicId topicWeight
[(0, 0.111), (1, 0.059), (2, -0.027), (3, 0.025), (4, -0.035), (5, -0.326), (6, 0.106), (7, 0.106), (8, -0.154), (9, 0.078), (10, 0.116), (11, 0.105), (12, -0.028), (13, 0.009), (14, 0.039), (15, -0.006), (16, 0.079), (17, 0.021), (18, -0.023), (19, 0.078), (20, -0.079), (21, 0.039), (22, 0.045), (23, 0.083), (24, 0.067), (25, -0.041), (26, -0.021), (27, 0.028), (28, 0.011), (29, -0.016), (30, 0.031), (31, 0.024), (32, -0.069), (33, 0.058), (34, -0.147), (35, 0.001), (36, 0.092), (37, -0.056), (38, 0.075), (39, 0.02), (40, 0.03), (41, -0.06), (42, 0.065), (43, 0.026), (44, 0.045), (45, -0.087), (46, -0.064), (47, 0.101), (48, 0.009), (49, 0.106)]
simIndex simValue paperId paperTitle
same-paper 1 0.96217525 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
2 0.7733621 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
3 0.74653 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
4 0.73695403 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
5 0.55934221 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
6 0.54248536 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
7 0.52791804 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs
8 0.42930296 14 nips-2011-A concave regularization technique for sparse mixture models
9 0.33119133 78 nips-2011-Efficient Methods for Overlapping Group Lasso
10 0.32063085 156 nips-2011-Learning to Learn with Compound HD Models
11 0.28616434 241 nips-2011-Scalable Training of Mixture Models via Coresets
12 0.27770573 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
13 0.24885412 147 nips-2011-Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors
14 0.24576548 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
15 0.24551609 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks
16 0.23661424 95 nips-2011-Fast and Accurate k-means For Large Datasets
17 0.22924152 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
18 0.22691858 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison
19 0.22035162 193 nips-2011-Object Detection with Grammar Models
20 0.21805601 123 nips-2011-How biased are maximum entropy models?
topicId topicWeight
[(4, 0.059), (20, 0.026), (21, 0.01), (26, 0.014), (28, 0.382), (31, 0.073), (33, 0.022), (43, 0.073), (45, 0.079), (57, 0.029), (74, 0.045), (83, 0.033), (84, 0.025), (99, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.71948528 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
2 0.61636198 95 nips-2011-Fast and Accurate k-means For Large Datasets
Author: Michael Shindler, Alex Wong, Adam W. Meyerson
Abstract: Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute k-means in o( nk) (where n is the number of data points; note that computing the cost, given a solution, takes 8(nk) time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.
3 0.56292021 156 nips-2011-Learning to Learn with Compound HD Models
Author: Antonio Torralba, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. 1
4 0.53710932 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
5 0.44269177 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing
Author: Shai Shalev-shwartz, Yonatan Wexler, Amnon Shashua
Abstract: Multiclass prediction is the problem of classifying an object into a relevant target class. We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sublinearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efficiently finds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the benefits of ShareBoost and evaluate its success relatively to other state-of-the-art approaches. 1
6 0.37515855 258 nips-2011-Sparse Bayesian Multi-Task Learning
7 0.37446833 75 nips-2011-Dynamical segmentation of single trials from population neural data
8 0.37048572 180 nips-2011-Multiple Instance Filtering
9 0.36980927 139 nips-2011-Kernel Bayes' Rule
10 0.36954311 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
11 0.36941579 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
12 0.36825645 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
13 0.36761245 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
14 0.36710611 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
15 0.3670634 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
16 0.36665741 276 nips-2011-Structured sparse coding via lateral inhibition
17 0.36658227 219 nips-2011-Predicting response time and error rates in visual search
18 0.36621088 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
19 0.36560169 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
20 0.3651669 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems