jmlr jmlr2010 jmlr2010-70 knowledge-graph by maker-knowledge-mining

70 jmlr-2010-MOA: Massive Online Analysis

Source: pdf

Author: Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer

Abstract: Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of ofﬂine and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Na¨ve Bayes classiﬁers at the leaves. MOA ı supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license. Keywords: data streams, classiﬁcation, ensemble methods, java, machine learning software

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 NZ Department of Computer Science University of Waikato Hamilton, New Zealand Editor: Mikio Braun Abstract Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. [sent-13, score-0.22]

2 MOA includes a collection of ofﬂine and online methods as well as tools for evaluation. [sent-14, score-0.027]

3 Keywords: data streams, classiﬁcation, ensemble methods, java, machine learning software 1. [sent-17, score-0.025]

4 A main approach to green computing is based on algorithmic efﬁciency. [sent-19, score-0.026]

5 In the data stream model, data arrive at high speed, and an algorithm must process them under very strict constraints of space and time. [sent-20, score-0.383]

6 MOA is an open-source framework for dealing with massive evolving data streams. [sent-21, score-0.14]

7 MOA is related to WEKA, the Waikato Environment for Knowledge Analysis, which is an award-winning open-source workbench containing implementations of a wide range of batch machine learning methods. [sent-22, score-0.104]

8 A data stream environment has different requirements from the traditional batch learning setting. [sent-23, score-0.595]

9 The algorithm is passed the next available example from the stream (Requirement 1). [sent-25, score-0.383]

10 B IFET, H OLMES , K IRKBY AND P FAHRINGER Learning Examples (1) Input Requirement 1 (2) Learning Requirements 2&3 (3) Model Requirement 4 Prediction Figure 1: The data stream classiﬁcation cycle 2. [sent-27, score-0.418]

11 It does so without exceeding the memory bounds set on it (requirement 2), and as quickly as possible (Requirement 3). [sent-29, score-0.044]

12 The algorithm is ready to accept the next example. [sent-31, score-0.031]

13 In traditional batch learning the problem of limited data is overcome by analyzing and averaging multiple models produced with different random arrangements of training and test data. [sent-33, score-0.128]

14 In the stream setting the problem of (effectively) unlimited data poses different challenges. [sent-34, score-0.404]

15 One solution involves taking snapshots at different times during the induction of a model to see how much the model improves. [sent-35, score-0.024]

16 When considering what procedure to use in the data stream setting, one of the unique concerns is how to build a picture of accuracy over time. [sent-37, score-0.383]

17 Two main approaches arise: • Holdout: When traditional batch learning reaches a scale where cross-validation is too time consuming, it is often accepted to instead measure performance on a single holdout set. [sent-38, score-0.203]

18 When intentionally performed in this order, the model is always being tested on examples it has not seen. [sent-41, score-0.021]

19 This scheme has the advantage that no holdout set is needed for testing, making maximum use of the available data. [sent-42, score-0.101]

20 1602 MOA: M ASSIVE O NLINE A NALYSIS Figure 2: MOA Graphical User Interface As data stream classiﬁcation is a relatively new ﬁeld, such evaluation practices are not nearly as well researched and established as they are in the traditional batch setting. [sent-45, score-0.514]

21 The majority of experimental evaluations use less than one million training examples. [sent-46, score-0.048]

22 In the context of data streams this is disappointing, because to be truly useful at data stream classiﬁcation the algorithms need to be capable of handling very large (potentially inﬁnite) streams of examples. [sent-47, score-0.741]

23 Demonstrating systems only on small amounts of data does not build a convincing case for capacity to solve more demanding stream applications (Kirkby, 2007). [sent-48, score-0.412]

24 MOA permits evaluation of data stream classiﬁcation algorithms on large streams, in the order of tens of millions of examples where possible, and under explicit memory limits. [sent-49, score-0.427]

25 Any less than this does not actually test algorithms in a realistically challenging setting. [sent-50, score-0.029]

26 The main beneﬁts of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and well-developed support libraries. [sent-53, score-0.042]

27 Use of the language is widespread, and features such as automatic garbage collection help to reduce programmer burden and error. [sent-54, score-0.026]

28 Considering data streams as data generated from pure distributions, MOA models a concept drift event as a weighted combination of two pure distributions that characterizes the target concepts before and after the drift. [sent-58, score-0.296]

29 Within the framework, it is possible to deﬁne the probability that instances of the stream belong to the new concept after the drift. [sent-59, score-0.383]

30 MOA contains the data generators most commonly found in the literature. [sent-62, score-0.096]

31 MOA streams can be built using generators, reading ARFF ﬁles, joining several streams, or ﬁltering streams. [sent-63, score-0.167]

32 The following generators are currently available: Random Tree Generator, SEA Concepts Generator, STAGGER Concepts Generator, Rotating Hyperplane, Random RBF Generator, LED Generator, Waveform Generator, and Function Generator. [sent-65, score-0.096]

33 The website includes a tutorial, an API reference, a user manual, and a manual about mining data streams. [sent-75, score-0.153]

34 Several examples of how the software can be used are available. [sent-76, score-0.025]

35 Although the current focus in MOA is on classiﬁcation, we plan to extend the framework to include data stream clustering, regression, and frequent pattern learning (Bifet, 2010). [sent-88, score-0.383]

36 Improving adaptive bagging a methods for evolving data streams. [sent-93, score-0.186]

37 Issues in evaluation of stream learning a a algorithms. [sent-99, score-0.383]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('moa', 0.641), ('stream', 0.383), ('bifet', 0.236), ('waikato', 0.191), ('geoff', 0.173), ('streams', 0.167), ('generator', 0.156), ('pfahringer', 0.144), ('holmes', 0.12), ('kirkby', 0.115), ('hoeffding', 0.109), ('albert', 0.105), ('holdout', 0.101), ('generators', 0.096), ('evolving', 0.095), ('bagging', 0.091), ('bernhard', 0.09), ('nz', 0.08), ('java', 0.08), ('requirement', 0.076), ('batch', 0.075), ('environment', 0.073), ('evaluateinterleavedtestthentrain', 0.067), ('gavald', 0.067), ('hoeffdingtree', 0.067), ('ifet', 0.067), ('irkby', 0.067), ('ricard', 0.067), ('richard', 0.066), ('gama', 0.058), ('fahringer', 0.058), ('olmes', 0.058), ('million', 0.048), ('massive', 0.045), ('mining', 0.044), ('manual', 0.042), ('weka', 0.042), ('website', 0.04), ('concepts', 0.04), ('requirements', 0.037), ('cycle', 0.035), ('cs', 0.033), ('ac', 0.032), ('ready', 0.031), ('pure', 0.03), ('interface', 0.029), ('drift', 0.029), ('raquel', 0.029), ('interleaved', 0.029), ('ena', 0.029), ('acml', 0.029), ('arff', 0.029), ('convincing', 0.029), ('disappointing', 0.029), ('portability', 0.029), ('practices', 0.029), ('realistically', 0.029), ('tutorials', 0.029), ('workbench', 0.029), ('classi', 0.028), ('online', 0.027), ('user', 0.027), ('traditional', 0.027), ('ine', 0.026), ('arrangements', 0.026), ('ios', 0.026), ('jo', 0.026), ('zealand', 0.026), ('gpl', 0.026), ('asian', 0.026), ('pakdd', 0.026), ('sea', 0.026), ('encapsulated', 0.026), ('api', 0.026), ('garbage', 0.026), ('green', 0.026), ('software', 0.025), ('handling', 0.024), ('tree', 0.024), ('snapshots', 0.024), ('hamilton', 0.024), ('pedro', 0.024), ('mikio', 0.024), ('sigkdd', 0.024), ('memory', 0.023), ('boosting', 0.023), ('nline', 0.022), ('command', 0.022), ('rotating', 0.022), ('inspect', 0.022), ('braun', 0.022), ('consuming', 0.022), ('widespread', 0.022), ('gnu', 0.021), ('unlimited', 0.021), ('platform', 0.021), ('exceeding', 0.021), ('millions', 0.021), ('intentionally', 0.021), ('virtual', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 70 jmlr-2010-MOA: Massive Online Analysis

Author: Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer

2 0.14016555 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten

Abstract: WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the software’s functionality, we review aspects of project management and historical development decisions that likely had an impact on the uptake of the project. Keywords: machine learning software, open source software

3 0.032197908 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

Author: Markus Ojala, Gemma C. Garriga

Abstract: We explore the framework of permutation-based p-values for assessing the performance of classiﬁers. In this paper we study two simple permutation tests. The ﬁrst test assess whether the classiﬁer has found a real class structure in the data; the corresponding null distribution is estimated by permuting the labels in the data. This test has been used extensively in classiﬁcation problems in computational biology. The second test studies whether the classiﬁer is exploiting the dependency between the features in classiﬁcation; the corresponding null distribution is estimated by permuting the features within classes, inspired by restricted randomization techniques traditionally used in statistics. This new test can serve to identify descriptive features which can be valuable information in improving the classiﬁer performance. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classiﬁer performance via permutation tests is effective. In particular, the restricted permutation test clearly reveals whether the classiﬁer exploits the interdependency between the features in the data. Keywords: classiﬁcation, labeled data, permutation tests, restricted randomization, signiﬁcance testing

4 0.026241466 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

Author: Yael Ben-Haim, Elad Tom-Tov

Abstract: We propose a new algorithm for building decision tree classiﬁers. The algorithm is executed in a distributed environment and is especially designed for classifying large data sets and streaming data. It is empirically shown to be as accurate as a standard decision tree classiﬁer, while being scalable for processing of streaming data on multiple processors. These ﬁndings are supported by a rigorous analysis of the algorithm’s accuracy. The essence of the algorithm is to quickly construct histograms at the processors, which compress the data to a ﬁxed amount of memory. A master processor uses this information to ﬁnd near-optimal split points to terminal tree nodes. Our analysis shows that guarantees on the local accuracy of split points imply guarantees on the overall tree accuracy. Keywords: decision tree classiﬁers, distributed computing, streaming data, scalability

5 0.025493154 15 jmlr-2010-Approximate Tree Kernels

Author: Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert Müller

Abstract: Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are signiﬁcantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space. Keywords: tree kernels, approximation, kernel methods, convolution kernels

6 0.024593875 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

7 0.021251304 112 jmlr-2010-Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

8 0.019688122 45 jmlr-2010-High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

9 0.018736191 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

10 0.018688574 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory

11 0.016087346 22 jmlr-2010-Classification Using Geometric Level Sets

12 0.014996864 40 jmlr-2010-Fast and Scalable Local Kernel Machines

13 0.014392085 83 jmlr-2010-On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

14 0.013731057 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

15 0.013722256 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

16 0.013252039 8 jmlr-2010-A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design

17 0.012959427 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

18 0.012622725 113 jmlr-2010-Tree Decomposition for Large-Scale SVM Problems

19 0.01243739 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

20 0.012317675 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.065), (1, 0.022), (2, -0.031), (3, 0.031), (4, 0.012), (5, 0.06), (6, -0.007), (7, -0.021), (8, -0.057), (9, 0.072), (10, 0.095), (11, -0.022), (12, -0.061), (13, 0.074), (14, 0.013), (15, -0.019), (16, 0.029), (17, -0.085), (18, -0.147), (19, -0.099), (20, -0.144), (21, 0.133), (22, 0.247), (23, -0.347), (24, -0.282), (25, -0.003), (26, 0.113), (27, 0.167), (28, -0.06), (29, -0.057), (30, 0.143), (31, 0.02), (32, 0.019), (33, 0.176), (34, 0.064), (35, -0.103), (36, 0.092), (37, -0.119), (38, -0.005), (39, -0.057), (40, 0.095), (41, 0.096), (42, -0.007), (43, -0.017), (44, -0.001), (45, -0.097), (46, 0.077), (47, -0.11), (48, -0.014), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95201117 70 jmlr-2010-MOA: Massive Online Analysis

Author: Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer

2 0.89828116 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten

3 0.15974234 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

Author: Markus Ojala, Gemma C. Garriga

4 0.1307229 113 jmlr-2010-Tree Decomposition for Large-Scale SVM Problems

Author: Fu Chang, Chien-Yang Guo, Xiao-Rong Lin, Chi-Jen Lu

Abstract: To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a given data space and train SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for large-scale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training for the remaining data points. Second, it is efﬁcient in determining the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, the tree decomposition method can derive a generalization error bound for the classiﬁer. For data sets whose size can be handled by current non-linear, or kernel-based, SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy. Keywords: binary tree, generalization error ¨bound, margin-based theory, pattern classiﬁcation, ı tree decomposition, support vector machine, VC theory

5 0.12782556 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

Author: Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, Milan Studený

Abstract: The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efﬁciency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method. Keywords: conditional independence inference, linear programming approach

6 0.11016472 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

7 0.1057196 63 jmlr-2010-Learning Instance-Specific Predictive Models

8 0.10464171 112 jmlr-2010-Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

9 0.10364068 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

10 0.095197991 94 jmlr-2010-Quadratic Programming Feature Selection

11 0.092548594 2 jmlr-2010-A Convergent Online Single Time Scale Actor Critic Algorithm

12 0.089061022 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

13 0.087309889 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods

14 0.086715177 15 jmlr-2010-Approximate Tree Kernels

15 0.086576492 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory

16 0.085577749 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

17 0.083647192 77 jmlr-2010-Model-based Boosting 2.0

18 0.081068695 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

19 0.080774844 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

20 0.078878 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(32, 0.039), (36, 0.017), (37, 0.033), (56, 0.585), (75, 0.097), (85, 0.051), (96, 0.015), (97, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75061047 70 jmlr-2010-MOA: Massive Online Analysis

Author: Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer

2 0.55807745 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random ﬁelds and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random ﬁelds, Bayesian networks

3 0.32479849 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten

4 0.19915341 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

Author: Pannagadatta K. Shivaswamy, Tony Jebara

Abstract: Leading classiﬁcation methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identiﬁes its sensitivity to afﬁne transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classiﬁcation function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classiﬁcation function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efﬁciency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, text classiﬁcation and on several other benchmark data sets. In addition, risk bounds are derived for the new formulation based on Rademacher averages. Keywords: support vector machines, kernel methods, large margin, Rademacher complexity

5 0.19914718 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

Author: Sören Sonnenburg, Gunnar Rätsch, Sebastian Henschel, Christian Widmer, Jonas Behr, Alexander Zien, Fabio de Bona, Alexander Binder, Christian Gehl, Vojtěch Franc

Abstract: We have developed a machine learning toolbox, called SHOGUN, which is designed for uniﬁed large-scale learning for a broad range of feature types and learning settings. It offers a considerable number of machine learning models such as support vector machines, hidden Markov models, multiple kernel learning, linear discriminant analysis, and more. Most of the speciﬁc algorithms are able to deal with several different data classes. We have used this toolbox in several applications from computational biology, some of them coming with no less than 50 million training examples and others with 7 billion test examples. With more than a thousand installations worldwide, SHOGUN is already widely adopted in the machine learning community and beyond. SHOGUN is , implemented in C++ and interfaces to MATLABTM R, Octave, Python, and has a stand-alone command line interface. The source code is freely available under the GNU General Public License, Version 3 at http://www.shogun-toolbox.org. Keywords: support vector machines, kernels, large-scale learning, Python, Octave, R

6 0.19734636 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

7 0.19674973 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

8 0.19669582 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

9 0.19649406 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

10 0.1954971 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory

11 0.19529632 63 jmlr-2010-Learning Instance-Specific Predictive Models

12 0.19508496 18 jmlr-2010-Bundle Methods for Regularized Risk Minimization

13 0.19466823 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

14 0.19341217 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

15 0.19340949 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

16 0.19306803 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

17 0.19298117 5 jmlr-2010-A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

18 0.19289726 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

19 0.1928546 22 jmlr-2010-Classification Using Geometric Level Sets

20 0.19274466 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes