nips nips2013 nips2013-10 knowledge-graph by maker-knowledge-mining

10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification


Source: pdf

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e. [sent-7, score-0.98]

2 To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. [sent-10, score-0.658]

3 We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. [sent-11, score-1.309]

4 Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. [sent-12, score-1.071]

5 We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. [sent-13, score-0.92]

6 1 Introduction Recent years have seen an explosion in the availability of time series data related to virtually every human endeavor — data that demands to be analyzed and turned into valuable insights. [sent-14, score-0.324]

7 As a running example used throughout this paper, consider a time series that tracks how much activity there is for a particular news topic on Twitter. [sent-16, score-0.599]

8 Given this time series up to present time, we ask “will this news topic go viral? [sent-17, score-0.566]

9 ” Borrowing Twitter’s terminology, we label the time series a “trend” and call its corresponding news topic a trending topic if the news topic goes viral; otherwise, the time series has label “not trend”. [sent-18, score-1.705]

10 We seek to forecast whether a news topic will become a trend before it is declared a trend (or not) by Twitter, amounting to a binary classification problem. [sent-19, score-0.688]

11 Importantly, we skirt the discussion of what makes a topic considered trending as this is irrelevant to our mathematical development. [sent-20, score-0.333]

12 1 Furthermore, we remark that handling the case where a single time series can have different labels at different times is beyond the scope of this paper. [sent-21, score-0.347]

13 1 While it is not public knowledge how Twitter defines a topic to be a trending topic, Twitter does provide information for which topics are trending topics. [sent-22, score-0.664]

14 More recently, researchers have examined which distance to use with nearest-neighbor classification [2, 7, 18] or how to boost classification performance by applying different transformations to the time series before using nearest-neighbor classification [1]. [sent-25, score-0.324]

15 These existing results are mostly experimental, lacking theoretical justification for both when nearest-neighbor-like time series classifiers should be expected to perform well and how well. [sent-26, score-0.346]

16 However, rather than examining the asymptotic case where the amount of data goes to infinity, we instead pursue nonasymptotic performance guarantees in terms of how large of a training dataset we have and how much we observe of the time series to be classified. [sent-28, score-0.478]

17 We present a model for which nearest-neighbor-like classification performs well by operationalizing the following hypothesis: In many time series applications, there are only a small number of prototypical time series relative to the number of time series we can collect. [sent-31, score-0.995]

18 In this context, we present a novel latent source model: time series are generated from a small collection of m unknown latent sources, each having one of two labels, say “trend” or “not trend”. [sent-34, score-0.842]

19 Our model’s maximum a posteriori (MAP) time series classifier can be approximated by weighted majority voting, which compares the time series to be classified with each of the time series in the labeled training data. [sent-35, score-1.501]

20 Each training time series casts a weighted vote in favor of its ground truth label, with the weight depending on how similar the time series being classified is to the training example. [sent-36, score-1.069]

21 The voting is nonparametric in that it does not learn parameters for a model and is driven entirely by the training data. [sent-38, score-0.592]

22 The unknown latent sources are never estimated; the training data serve as a proxy for these latent sources. [sent-39, score-0.747]

23 Weighted majority voting itself can be approximated by a nearest-neighbor classifier, which we also analyze. [sent-40, score-0.764]

24 Under our model, we show sufficient conditions so that if we have n = ⇥(m log m ) time series in our training data, then weighted majority voting and nearest-neighbor classification correctly classify a new time series with probability at least 1 after observing its first ⌦(log m ) time steps. [sent-41, score-1.897]

25 Also, while our analysis yields matching error upper bounds for the two classifiers, experimental results on synthetic data suggests that weighted majority voting outperforms nearest-neighbor classification early on when we observe very little of the time series to be classified. [sent-43, score-1.302]

26 Meanwhile, a specific instantiation of our model leads to a spherical Gaussian mixture model, where the latent sources are Gaussian mixture components. [sent-44, score-0.609]

27 We show that existing performance guarantees on learning spherical Gaussian mixture models [6, 10, 17] require more stringent conditions than what our results need, suggesting that learning the latent sources is overkill if the goal is classification. [sent-45, score-0.563]

28 Lastly, we apply weighted majority voting to forecasting trending topics on Twitter. [sent-46, score-1.316]

29 We emphasize that our goal is precognition of trends: predicting whether a topic is going to be a trend before it is actually declared to be a trend by Twitter or, in theory, any other third party that we can collect ground truth labels from. [sent-47, score-0.534]

30 Existing work that identify trends on Twitter [3, 4, 13] instead, as part of their trend detection, define models for what trends are, which we do not do, nor do we assume we have access to such definitions. [sent-48, score-0.344]

31 ) In our experiments, weighted majority voting is able to predict whether a topic will be a trend in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. [sent-50, score-1.401]

32 We empirically find that the Twitter activity of a news topic that becomes a trend tends to follow one of a finite number of patterns, which could be thought of as latent sources. [sent-51, score-0.682]

33 Weighted majority voting and nearest-neighbor classification for time series are presented in Section 2. [sent-53, score-1.088]

34 We provide our latent source model and theoretical performance guarantees of weighted majority voting and nearest-neighbor classification under this model in Section 3. [sent-54, score-1.272]

35 Experimental results for synthetic data and forecasting trending topics on Twitter are in Section 4. [sent-55, score-0.418]

36 To do so, we have access to labeled training data R+ and R , which denote the sets of all training time series with labels +1 and 1 respectively. [sent-58, score-0.587]

37 Each positively-labeled example r 2 R+ casts a weighted vote (T ) e d (r,s) for whether time series s has label +1, where d(T ) (r, s) is some measure of similarity between the two time series r and s, superscript (T ) indicates that we are only allowed to look at the first T time steps (i. [sent-60, score-1.129]

38 , T ) of s (but we’re allowed to look outside of these time steps for the training time series r), and constant 0 is a scaling parameter that determines the “sphere of influence” of each example. [sent-65, score-0.531]

39 Similarly, each negatively-labeled example in R also casts a weighted vote for whether time series s has label 1. [sent-66, score-0.687]

40 However, this similarity measure only looks at the first T time T t=1 (r(t) steps of training time series r. [sent-68, score-0.509]

41 Since time series in our training data are known, we need not restrict our attention to their first T time steps. [sent-69, score-0.482]

42 , max } kr ⇤ sk2 , T (1) where we minimize over integer time shifts with a pre-specified maximum allowed shift max 0. [sent-82, score-0.329]

43 Here, we have used q⇤ to denote time series q advanced by time steps, i. [sent-83, score-0.393]

44 Finally, we sum up all of the weighted +1 votes and then all of the weighted 1 votes. [sent-86, score-0.351]

45 The label with the majority of overall weighted votes is declared as the label for s: ( P P (T ) d(T ) (r,s) +1 if r2R+ e d (r,s) , (T ) r2R e b L (s; ) = (2) 1 otherwise. [sent-87, score-0.739]

46 b 3 A Latent Source Model and Theoretical Guarantees We assume there to be m unknown latent sources (time series) that generate observed time series. [sent-94, score-0.489]

47 Let V denote the set of all such latent sources; each latent source v : Z ! [sent-95, score-0.518]

48 Let V+ ⇢ V be the set of latent sources with label +1, and V ⇢ V be the set of those with label 1. [sent-97, score-0.66]

49 The observed time series are generated from latent sources as follows: 2 1. [sent-98, score-0.744]

50 We index time using Z for notationally convenience but will assume time series to start at time step 1. [sent-101, score-0.462]

51 The only change is that the number of training data needed will be larger by a factor of m⇡1 , where ⇡min is the smallest probability of a particular latent source min occurring. [sent-103, score-0.397]

52 3 3 activity +1 {1 +1 {1 +1 {1 time Figure 1: Example of latent sources superimposed, where each latent source is shifted vertically in amplitude such that every other latent source has label +1 and the rest have label 1. [sent-104, score-1.405]

53 R to be latent source V advanced by time steps, followed by adding noise signal E : Z ! [sent-112, score-0.377]

54 The label associated with the generated time series S is the same as that of V , i. [sent-116, score-0.444]

55 For instance, the latent sources could be tiled as shown in Figure 1, where they are evenly separated vertically and alternate between the two different classes +1 and 1. [sent-126, score-0.49]

56 With a parametric model like a k-component Gaussian mixture model, estimating these latent sources could be problematic. [sent-127, score-0.51]

57 For example, if we take any two adjacent latent sources with label +1 and cluster them, then this cluster could be confused with the latent source having label 1 that is sandwiched in between. [sent-128, score-0.99]

58 In this example, the k-component Gaussian mixture model needed for label +1 would require k to be the exact number of latent sources with label +1, which is unknown. [sent-130, score-0.728]

59 As we discuss next, for classification, we sidestep learning the latent sources altogether, instead using training data as a proxy for latent sources. [sent-132, score-0.747]

60 If we knew the latent sources and if noise entries E(t) were i. [sent-135, score-0.42]

61 N (0, 21 ) across t, then the maximum a posteriori (MAP) estimate for label L given an observed time series S = s is ( (T ) b (T ) (s; ) = +1 if ⇤MAP (s; ) 1, LMAP (5) 1 otherwise, where (T ) ⇤MAP (s; and D+ , {0, . [sent-138, score-0.444]

62 We make a further assumption that the training data were sampled from the latent source model and that we have n different training time series. [sent-147, score-0.555]

63 Then we approximate the MAP classifier by using training data as a proxy for the latent sources. [sent-155, score-0.327]

64 (7) exp min 2D kr ⇤ sk2 T r 2R 4 (T ) Plugging ⇤(T ) in place of ⇤MAP in classification rule (5) yields the weighted majority voting rule (2). [sent-157, score-1.019]

65 Note that weighted majority voting could be interpreted as a smoothed nearest-neighbor approximation whereby we only consider the time-shifted version of each example time series that is closest to the observed time series s. [sent-158, score-1.59]

66 The resulting decision rule, which we refer to as generalized weighted majority voting, is thus: ⇢ (T ) b (T ) (s; ) = +1 if ⇤ (s, ) ✓, L✓ (8) 1 otherwise, where setting ✓ = 1 recovers the usual weighted majority voting (2). [sent-162, score-1.382]

67 T (9) This quantity measures how far apart the two different classes are if we only look at length-T chunks of each time series and allow all shifts of at most max time steps in either direction. [sent-168, score-0.52]

68 (Performance guarantee for generalized weighted majority voting) Let m+ = |V+ | be the number of latent sources with label +1, and m = |V | = m m+ be the number of latent sources with label 1. [sent-172, score-1.519]

69 Then with choice = 8 1 2 , generalized weighted majority voting (8) correctly classifies a new time series S with probability at least 1 if ✓ ⇣ ◆ ⇣ ✓m m ⌘ m⌘ + G(T ) (R+ , R , max ) = ⌦ 2 log + + log(2 max + 1) + log . [sent-180, score-1.488]

70 (12) m ✓m max ) Thus, the gap between sets R+ and R needs to grow logarithmic in the number of latent sources m in order for weighted majority voting to classify correctly with high probability. [sent-181, score-1.542]

71 Assuming that the 4 We use a minimum rather a summation over time shifts to make the method more similar to existing time series classification work (e. [sent-182, score-0.439]

72 (13) 16 2 Our generalized weighted majority voting bound (10) with ✓ = 1 (corresponding to regular weighted majority voting) and = 8 1 2 matches our nearest-neighbor classification bound, suggesting that the two methods have similar behavior when the gap grows with T . [sent-189, score-1.431]

73 In practice, we find weighted majority voting to outperform nearest-neighbor classification when T is small, and then as T grows large, the two methods exhibit similar performance in agreement with our theoretical analysis. [sent-190, score-0.972]

74 Weighted majority voting, on the other hand, can recover from this situation as there may be enough correctly labeled training time series close by that contribute to a higher overall vote for the correct class. [sent-192, score-0.776]

75 This robustness of weighted majority voting makes it favorable in the online setting where we want to make a prediction as early as possible. [sent-193, score-0.956]

76 If we can estimate the latent sources accurately, then we could plug these estimates in place of the true latent sources in the MAP classifier and achieve classification performance close to optimal. [sent-195, score-0.862]

77 If we restrict the noise to be Gaussian and assume max = 0, then the latent source model corresponds to a spherical Gaussian mixture model. [sent-196, score-0.483]

78 In contrast, our result is in terms of gap G(T ) (R+ , R , max ) that depends not on the true separation between two latent sources but instead on the minimum observed separation in the training data between two time series of opposite labels. [sent-202, score-0.975]

79 In such scenarios, one could instead non-uniformly subsample O(T m3 /"2 ) time series from the training data using the procedure given in [8] and then feed the resulting smaller dataset, referred to as an (m, ")-coreset, to the EM algorithm for learning the latent sources. [sent-211, score-0.645]

80 This procedure still requires more training time series than needed for classification and lacks a guarantee that the estimated latent sources will be close to the true latent sources. [sent-212, score-1.043]

81 6 Weighted majority voting Nearest−neighbor classifier Oracle MAP classifier 0. [sent-214, score-0.878]

82 25 Weighted majority voting Nearest−neighbor classifier Oracle MAP classifier 0. [sent-220, score-0.878]

83 time Figure 3: How news topics become trends on Twitter. [sent-231, score-0.398]

84 The top left shows some time series of activity leading up to a news topic becoming trending. [sent-232, score-0.599]

85 These time series superimposed look like clutter, but we can separate them into different clusters, as shown in the next five plots. [sent-233, score-0.353]

86 We generate m = 200 latent sources, where each latent source is constructed by first sampling i. [sent-236, score-0.518]

87 Half of the latent sources are labeled +1 and the other half 1. [sent-240, score-0.443]

88 Then n = m log m training time series are sampled as per the latent source model where the noise added is i. [sent-241, score-0.759]

89 For = 8, we compare the classification error rates on test data for weighted majority voting, nearest-neighbor classification, and the MAP classifier with oracle access to the true latent sources as shown in Figure 2(a). [sent-247, score-0.902]

90 We see that weighted majority voting outperforms nearest-neighbor classification but as T grows large, the two methods’ performances converge to that of the MAP classifier. [sent-248, score-0.95]

91 We see that as increases, both weighted majority voting and nearest-neighbor classification steadily improve in performance. [sent-250, score-0.92]

92 (a) Weighted majority voting achieves a low error rate (FPR of 4%, TPR of 95%) and detects trending topics in advance of Twitter 79% of the time, with a mean of 1. [sent-255, score-1.16]

93 In practice, one could still expressly assemble the training data to have pre-specified class sizes and then tune ✓ for generalized weighted majority voting (8). [sent-261, score-1.053]

94 In our experiments, we use the usual weighted majority voting (2) (i. [sent-262, score-0.92]

95 Per topic, we created its time series based on a pre-processed version of the raw rate of how often the topic was shared, i. [sent-265, score-0.475]

96 We empirically found that how news topics become trends tends to follow a finite number of patterns; a few examples of these patterns are shown in Figure 3. [sent-268, score-0.329]

97 We applied weighted majority voting, sweeping over , T , and data pre-processing parameters. [sent-270, score-0.417]

98 As shown in Figure 4(a), one choice of parameters allows us to detect trending topics in advance of Twitter 79% of the time, and when we do, we detect them an average of 1. [sent-271, score-0.407]

99 Emerging topic detection on twitter based on temporal and social terms evaluation. [sent-297, score-0.533]

100 Querying and mining of time series data: experimental comparison of representations and distance measures. [sent-308, score-0.351]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('voting', 0.503), ('twitter', 0.372), ('majority', 0.261), ('series', 0.255), ('trending', 0.215), ('latent', 0.21), ('sources', 0.21), ('trend', 0.175), ('classi', 0.167), ('weighted', 0.156), ('news', 0.124), ('label', 0.12), ('topic', 0.118), ('topics', 0.116), ('source', 0.098), ('training', 0.089), ('fpr', 0.083), ('tpr', 0.083), ('time', 0.069), ('classify', 0.068), ('mixture', 0.068), ('forecasting', 0.065), ('trends', 0.065), ('kr', 0.059), ('er', 0.058), ('classifier', 0.057), ('cation', 0.056), ('eamonn', 0.056), ('max', 0.054), ('spherical', 0.053), ('nearest', 0.05), ('shifts', 0.046), ('casts', 0.046), ('detection', 0.043), ('declared', 0.043), ('nonasymptotic', 0.043), ('viral', 0.043), ('gap', 0.042), ('map', 0.042), ('vote', 0.041), ('kv', 0.041), ('access', 0.039), ('votes', 0.039), ('neighbor', 0.039), ('correctly', 0.038), ('log', 0.038), ('phrases', 0.038), ('manners', 0.037), ('xiaoyue', 0.037), ('early', 0.036), ('rate', 0.033), ('activity', 0.033), ('shiva', 0.033), ('advance', 0.032), ('arindam', 0.03), ('devavrat', 0.03), ('misclassifying', 0.03), ('prem', 0.03), ('schulman', 0.03), ('grows', 0.03), ('false', 0.029), ('forecast', 0.029), ('superimposed', 0.029), ('proxy', 0.028), ('observing', 0.027), ('kasiviswanathan', 0.027), ('prasad', 0.027), ('declare', 0.027), ('vertically', 0.027), ('mining', 0.027), ('steps', 0.027), ('dasgupta', 0.027), ('gaussian', 0.027), ('oracle', 0.026), ('posts', 0.026), ('hour', 0.025), ('shift', 0.025), ('management', 0.024), ('become', 0.024), ('vempala', 0.024), ('elaborate', 0.023), ('labeled', 0.023), ('decision', 0.023), ('prototypical', 0.023), ('classification', 0.023), ('separation', 0.023), ('labels', 0.023), ('generalized', 0.022), ('detect', 0.022), ('synthetic', 0.022), ('aggressive', 0.022), ('theoretical', 0.022), ('guarantees', 0.022), ('replace', 0.022), ('could', 0.022), ('allowed', 0.022), ('separated', 0.021), ('mixtures', 0.021), ('longer', 0.021), ('rule', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

2 0.1323366 75 nips-2013-Convex Two-Layer Modeling

Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans

Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1

3 0.09444163 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition

Author: Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan

Abstract: Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a “relevance topic model” for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos. 1

4 0.093071572 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

Author: Anima Anandkumar, Daniel Hsu, Majid Janzamin, Sham M. Kakade

Abstract: Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of “higher order” expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition. Keywords: Overcomplete representation, admixture models, generic identifiability, tensor decomposition.

5 0.093033671 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

Author: Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini

Abstract: We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies. To this end, we first propose a multiclass, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. This bound provides an explanation to several empirical results reported in the literature, related to the performance of flat and hierarchical classifiers. We then introduce another type of bound targeting the approximation error of a family of classifiers, and derive from it features used in a meta-classifier to decide which nodes to prune (or flatten) in a large-scale taxonomy. We finally illustrate the theoretical developments through several experiments conducted on two widely used taxonomies. 1

6 0.092122033 301 nips-2013-Sparse Additive Text Models with Low Rank Background

7 0.090308003 171 nips-2013-Learning with Noisy Labels

8 0.089303702 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

9 0.075820625 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

10 0.074436173 98 nips-2013-Documents as multiple overlapping windows into grids of counts

11 0.074158154 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

12 0.074134216 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

13 0.070239328 211 nips-2013-Non-Linear Domain Adaptation with Boosting

14 0.069955625 148 nips-2013-Latent Maximum Margin Clustering

15 0.069573633 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

16 0.069321543 70 nips-2013-Contrastive Learning Using Spectral Methods

17 0.066530727 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

18 0.06426432 344 nips-2013-Using multiple samples to learn mixture models

19 0.062777683 174 nips-2013-Lexical and Hierarchical Topic Regression

20 0.062014941 149 nips-2013-Latent Structured Active Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.167), (1, 0.071), (2, -0.048), (3, -0.028), (4, 0.09), (5, -0.018), (6, 0.055), (7, 0.011), (8, 0.041), (9, 0.025), (10, -0.021), (11, 0.004), (12, -0.008), (13, -0.006), (14, 0.044), (15, -0.088), (16, 0.093), (17, 0.081), (18, 0.093), (19, -0.044), (20, -0.038), (21, 0.026), (22, 0.119), (23, 0.021), (24, 0.034), (25, -0.085), (26, 0.008), (27, -0.048), (28, -0.125), (29, -0.076), (30, 0.013), (31, -0.014), (32, 0.007), (33, -0.072), (34, 0.031), (35, -0.022), (36, 0.035), (37, -0.068), (38, -0.018), (39, 0.054), (40, -0.046), (41, 0.062), (42, 0.033), (43, -0.026), (44, -0.033), (45, 0.056), (46, 0.082), (47, 0.056), (48, -0.071), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9597373 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

2 0.79667157 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition

Author: Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan

Abstract: Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a “relevance topic model” for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos. 1

3 0.64673781 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

Author: Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini

Abstract: We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies. To this end, we first propose a multiclass, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. This bound provides an explanation to several empirical results reported in the literature, related to the performance of flat and hierarchical classifiers. We then introduce another type of bound targeting the approximation error of a family of classifiers, and derive from it features used in a meta-classifier to decide which nodes to prune (or flatten) in a large-scale taxonomy. We finally illustrate the theoretical developments through several experiments conducted on two widely used taxonomies. 1

4 0.64210308 148 nips-2013-Latent Maximum Margin Clustering

Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori

Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1

5 0.63366085 75 nips-2013-Convex Two-Layer Modeling

Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans

Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1

6 0.60760063 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

7 0.60394502 12 nips-2013-A Novel Two-Step Method for Cross Language Representation Learning

8 0.59562075 98 nips-2013-Documents as multiple overlapping windows into grids of counts

9 0.5861001 174 nips-2013-Lexical and Hierarchical Topic Regression

10 0.58464789 301 nips-2013-Sparse Additive Text Models with Low Rank Background

11 0.5834893 223 nips-2013-On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation

12 0.5806514 70 nips-2013-Contrastive Learning Using Spectral Methods

13 0.57044357 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

14 0.5589062 294 nips-2013-Similarity Component Analysis

15 0.55845535 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

16 0.55251843 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

17 0.54586643 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

18 0.54143995 85 nips-2013-Deep content-based music recommendation

19 0.52996808 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation

20 0.52077836 65 nips-2013-Compressive Feature Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.011), (16, 0.023), (33, 0.126), (34, 0.083), (41, 0.023), (49, 0.036), (56, 0.094), (70, 0.025), (79, 0.013), (85, 0.027), (89, 0.402), (93, 0.044), (95, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88240546 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

2 0.8816641 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow

Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1

3 0.86869293 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1

4 0.84991992 91 nips-2013-Dirty Statistical Models

Author: Eunho Yang, Pradeep Ravikumar

Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1

5 0.78188747 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

6 0.75894374 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

7 0.67461377 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

8 0.63154858 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

9 0.62410736 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

10 0.61024207 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

11 0.60419828 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

12 0.59053683 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures

13 0.58535463 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models

14 0.58339435 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

15 0.57996982 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

16 0.5731802 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

17 0.57269818 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

18 0.57201731 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

19 0.56785929 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

20 0.56323469 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization