jmlr jmlr2006 jmlr2006-94 knowledge-graph by maker-knowledge-mining

94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation


Source: pdf

Author: Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood

Abstract: An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately, simulating every cycle of a real program can take weeks or months. To address this problem we have created a tool called SimPoint that uses data clustering algorithms from machine learning to automatically find repetitive patterns in a program’s execution. By simulating one representative of each repetitive behavior pattern, simulation time can be reduced to minutes instead of weeks for standard benchmark programs, with very little cost in terms of accuracy. We describe this important problem, the data representation and preprocessing methods used by SimPoint, the clustering algorithm at the core of SimPoint, and we evaluate different options for tuning SimPoint. Keywords: k-means, random projection, Bayesian information criterion, simulation, SimPoint

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Introduction Understanding the cycle level behavior of a processor during the execution of an application is crucial to modern computer architecture research. [sent-20, score-0.476]

2 H AMERLY, P ERELMAN , L AU , C ALDER AND S HERWOOD the full execution of a single benchmark can take weeks or months to complete, and nearly all industry standard benchmarks require the execution of a suite of programs. [sent-24, score-0.502]

3 The same program binary, with the exact same input, may be run hundreds or thousands of times to examine how, for example, the effectiveness of a given architecture changes with its cache size. [sent-27, score-0.348]

4 SimPoint intelligently chooses a very small set of samples from an executed program called simulation points that, when simulated and weighted appropriately, provide an accurate picture of the complete execution of the program. [sent-38, score-0.565]

5 Simulating in detail only these carefully chosen simulation points can save hours of simulation time over a random sampling of the program, while still providing the accuracy needed to make reliable decisions based on the outcome of the cycle level simulation. [sent-39, score-0.415]

6 Before we developed SimPoint, architecture researchers would simulate SPEC programs for 300 million instructions from the start of execution, or fast forward 1 billion instructions to try to get past the initialization part of the program. [sent-40, score-1.017]

7 This paper shows how repetitive phase behavior can be found in programs through machine learning and describes how SimPoint automatically finds these phases and picks simulation points. [sent-45, score-0.495]

8 First, Section 2 describes a summary of the simulation methodology in processor architecture research. [sent-47, score-0.373]

9 The correlation between the executing code and performance of a program is described in Section 4, as well as how this code is represented in vector format to capture program behavior. [sent-49, score-0.412]

10 Section 6 describes how simulation points are picked from the phases, and the accuracy resulting from representing the entire program execution using the simulation points. [sent-51, score-0.679]

11 1 Background Processor architecture research quantifies the effectiveness of a design by executing a program on a software model of the architecture design called an architecture simulator. [sent-58, score-0.667]

12 1 SPEC CPU B ENCHMARKS The SPEC CPU2000 benchmark suite has 26 programs, of which 12 are integer programs (primary execution is of integer instructions) and 14 are floating-point programs (primary execution is of floating-point instructions). [sent-65, score-0.614]

13 This means that with current and future speeds that future releases of the SPEC benchmark suite will need to execute on the order of trillions of instructions for the reference inputs. [sent-77, score-0.369]

14 2 S IMPLE S CALAR SimpleScalar is a program that models the cycle level execution of a processor. [sent-80, score-0.354]

15 It takes as input a program-input pair and simulates the execution from beginning to end, while computing relevant statistics for architecture research, such as cycles per instruction (CPI), cache miss rates, branch mispredictions, and power consumption. [sent-81, score-0.534]

16 At the coarsest level of detail, sim-fast models only the functional execution of a program at the instruction level. [sent-83, score-0.384]

17 The slowest yet most accurate model, sim-outorder, executes on the order of hundreds of million instructions per hour, which is several orders of magnitude slower than the native hardware. [sent-117, score-0.365]

18 This enormous turnaround time for a study makes simulating the full benchmark infeasible, and the majority of researchers only simulate a few hundred million instructions from each benchmark. [sent-120, score-0.495]

19 The CPI prediction error is the percent difference between CPI predicted using only simulation points chosen by SimPoint and the baseline (true) CPI of the complete execution of the program. [sent-127, score-0.381]

20 Defining Phase Behavior Since phases are a way of describing the recurring behavior of a program executing over time, we begin by describing phase analysis with a demonstration of the time-varying behavior (Sherwood and Calder, 1999) of two programs from the SPEC 2000 benchmark suite, gcc and gzip. [sent-130, score-0.605]

21 For each of the three plots, the horizontal axis represents the execution of the program over time, and each point plotted represents one 10-million instruction interval. [sent-141, score-0.384]

22 Programs may have stable behavior for billions of instructions and then change suddenly. [sent-155, score-0.358]

23 In addition to CPI, we have found for the SPEC 95 and 2000 programs that the behavior of all of the architecture metrics (branch prediction, cache misses, etc. [sent-156, score-0.366]

24 0 8 7 6 5 4 3 2 1 0 10 20 Billions of instructions executed 30 40 Figure 2: These plots show the relationship between measured performance (CPI) and code usage for the program gcc-166, and SimPoint’s ability to capture phase information by only looking at what code is being executed. [sent-167, score-0.777]

25 For each of the three plots, the horizontal axis represents the execution of the program over time, and each point plotted represents one 10-million instruction interval. [sent-168, score-0.384]

26 For the results in this work all intervals are chosen to be the same length, as measured in the number of instructions committed within an interval. [sent-182, score-0.405]

27 A phase may be made up of intervals which are disjoint in time; we would call this a phase with a repeating behavior. [sent-187, score-0.416]

28 A “well-formed” phase should have intervals with similar behavior across various architecture metrics (e. [sent-188, score-0.54]

29 The Strong Correlation Between Code and Performance In this section we describe how we identify phase behavior in an architecture independent fashion. [sent-194, score-0.36]

30 The metric we will use for comparing two time intervals in a program is based on the differences in the execution frequencies for each basic block executed during those two intervals. [sent-219, score-0.606]

31 The intuition behind this is that the behavior of the program at a given time is directly related to the code it is executing during that interval, and basic block vectors provide us with this information. [sent-220, score-0.366]

32 The basic idea is that knowing the basic block distribution for two different intervals gives two separate signatures which we can then compare to find out how similar the intervals are to one another. [sent-223, score-0.411]

33 This number is weighted by the number of instructions in the basic block, since we want every individual instruction to have the same influence. [sent-228, score-0.375]

34 Therefore, each element in the array is the count of how many times its corresponding basic block has been entered during an interval of execution, multiplied by the number of instructions in that basic block. [sent-229, score-0.462]

35 Frequency vectors can represent basic blocks, branch edges, or any other type of program related structure which provides a representative summary of a program’s behavior for each interval of execution. [sent-233, score-0.371]

36 Past definitions were built around the idea of a phase being a contiguous interval of execution during which a measured program metric is relatively stable. [sent-274, score-0.578]

37 A key observation from this paper is that the phase behavior seen in any program metric is a function of the code being executed. [sent-277, score-0.418]

38 To find how all intervals of execution relate to one another we create a basic block similarity matrix for a program/input pair. [sent-279, score-0.415]

39 An entry at (x, y) in the matrix represents the Manhattan distance between the basic block vector at interval x and the basic block vector at interval y. [sent-281, score-0.357]

40 An interval taken from 70 billion instructions into execution in Figure 3 (left) is directly in the middle of a large phase shown by the triangle of dark points that surround this point. [sent-291, score-0.79]

41 We can also see that the intervals at 50 billion and 90 billion instructions are also very similar to the program behavior at 70 billion instructions. [sent-293, score-0.807]

42 While it may be hard to see in a printed version, the intervals around 70 billion instructions are similar to the intervals around 10 billion and 30 billion instructions, and even more similar to those around 50 and 90 billion instructions. [sent-294, score-0.843]

43 Even for such a complex program, we see that there is common code shared between sections of execution, such as the intervals around 13 billion instructions and 36 billion instructions. [sent-306, score-0.624]

44 Data clustering algorithms from unsupervised machine learning have been shown to be very effective at breaking the complete execution of a program into phases that have similar frequency vectors (Sherwood et al. [sent-320, score-0.563]

45 The goal of clustering is to divide a set of points into clusters such that points within each cluster are similar to one another (by some metric), and points in different clusters are different from one another. [sent-323, score-0.513]

46 The k-means algorithm (MacQueen, 1967) is an efficient and well-known clustering algorithm, which we use to split program intervals into phases. [sent-325, score-0.399]

47 Taken to the extreme, if every interval of execution is given its very own cluster, then every cluster will have homogeneous behavior. [sent-329, score-0.365]

48 Our goal is to choose a clustering with a minimum number of clusters which still models the program behavior well. [sent-330, score-0.412]

49 Profile the program by dividing the program’s execution into contiguous intervals of fixed length (e. [sent-333, score-0.467]

50 Many clustering algorithms suffer from the so-called “curse of dimensionality,” which refers to the fact that finding an optimal clustering is intractable as the number of dimensions increases. [sent-363, score-0.352]

51 A scatterplot of the program gzip projected to 2 dimensions and clustered into 3 clusters using k-means is shown in Figure 5. [sent-405, score-0.437]

52 These results use an interval length of 10 million instructions and the maximum number of phases (K) is set to 10. [sent-437, score-0.541]

53 The horizontal axis corresponds to the execution of the program (in billions of instructions), and each interval is classified to belong to one of the clusters (labeled on the vertical axis). [sent-438, score-0.541]

54 The intervals in cluster 1 are grouped into the same phase because they execute a similar combination of code, which happens to be part of the code behavior in either cluster 2 or 4 and part of code executed in cluster 3. [sent-444, score-0.786]

55 Choosing Simulation Points from the Phase Classification After the phase classification algorithm has done its job, intervals with similar code usage will be grouped together into the same phases (clusters). [sent-450, score-0.427]

56 The selected interval is called a simulation point for that phase (Perelman et al. [sent-456, score-0.404]

57 Each weight is a fraction: it is the total number of instructions represented by the intervals in the cluster from which 359 H AMERLY, P ERELMAN , L AU , C ALDER AND S HERWOOD the simulation point was taken divided by the number of instructions in the program. [sent-461, score-0.938]

58 With the weights and the detailed simulation results of each simulation point, we can compute a weighted average for the architecture metric of interest (CPI, cache miss rate, etc. [sent-462, score-0.634]

59 These simulation points are chosen once for a program/input combination because they are chosen based only on how the code is executed, and not based on architecture metrics. [sent-464, score-0.438]

60 The number of simulation points that SimPoint chooses has a direct effect on the simulation time that will be required for those points. [sent-466, score-0.37]

61 Researchers in architecture tend to want to keep simulation time to below a fixed number of instructions (e. [sent-470, score-0.614]

62 If this is a goal, we find that an interval length of 10 million instructions with K = 30 provides very good accuracy (as we show in this paper) with reasonable simulation time (220 million instructions on average). [sent-473, score-1.017]

63 They simulate intervals of length 300-500 million instructions so they do not have to worry about implementing warmup in their simulation infrastructure. [sent-485, score-0.753]

64 With such long intervals the architecture structures are warmed up sufficiently during the beginning of the interval’s execution to provide accurate simulation results. [sent-486, score-0.659]

65 One way to do this is with an architecture checkpoint, which stores the potential contents of the major architecture components at the start of the simulation point (Biesbrouck et al. [sent-488, score-0.515]

66 Figure 6 shows the simulation accuracy results using SimPoint (and other methods) for the SPEC 2000 programs when compared to the complete execution of the programs. [sent-493, score-0.429]

67 For these results we use an interval length of 100 million instructions and limit the number of simulation points to no more than 10. [sent-494, score-0.68]

68 Results are shown for simulating from the start of the program’s execution, for fast-forwarding 1 billion instructions before simulating, and for using SimPoint to choose at most ten 100-million-instruction intervals to simulate. [sent-497, score-0.518]

69 that a total of 400 million instructions were simulated for gzip. [sent-501, score-0.365]

70 The baseline was gathered from spending months of simulation time to simulate the entire execution of each SPEC program. [sent-505, score-0.386]

71 The first technique was to just simulate the first N million instructions of a benchmark’s execution. [sent-507, score-0.398]

72 The second technique was to blindly skip the first billion instructions of execution to get past the initialization of the program’s execution, and then simulate for N million instructions. [sent-508, score-0.656]

73 If instead, we fast forwarded for 1 billion instructions and then simulate for the same number of instructions as chosen by SimPoint, we see a median 23% IPC error. [sent-510, score-0.651]

74 5 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 C onfiguration gcc-166 Figure 7: This plot shows the true and estimated IPC and cache miss rates for 19 different architecture configurations for the program gcc. [sent-519, score-0.421]

75 To examine the independence of the simulation points from the underlying architecture, we used the simulation points for the SimPoint algorithm with an interval length of 1 million instructions and the maximum K set to 300. [sent-524, score-0.879]

76 This figure shows that the simulation points, which are chosen by only looking at code usage, can be used across different architecture configurations to make accurate architecture design trade-off decisions and comparisons. [sent-533, score-0.606]

77 We believe one reason for the bias is that SimPoint chooses the most representative interval from each phase, and intervals that represent phase change boundaries are less likely to be fully represented across the chosen simulation points. [sent-538, score-0.586]

78 To show this effect, Table 4 shows the SimPoint running time for gcc-166 and crafty-ref, which shows the lower and upper limits for the number of intervals and basic block vectors seen in SPEC 2000 with an interval length of 10 million instructions. [sent-606, score-0.445]

79 The effect of the number of intervals on the running time of SimPoint becomes critical when using very small interval lengths like 1 million instructions or fewer, which can create millions of intervals to cluster. [sent-616, score-0.725]

80 The probability of each interval being chosen is proportional to the weight of its interval (the number of dynamically executed instructions it represents). [sent-619, score-0.512]

81 When using VLIs we had millions of intervals, and had to sub-sample 10,000 to 100,000 intervals for the clustering to achieve a reasonable running time for SimPoint, while still providing very accurate simulation points. [sent-630, score-0.443]

82 Results are shown for creating the initial clustering using sub-sampling with only 1/8, 1/4, 1/2, and all of the execution intervals in each program, as described above. [sent-632, score-0.454]

83 4 0 10 20 30 40 50 60 70 80 90 100 projected dimensions 0 10 20 30 40 50 60 70 80 90 100 projected dimensions Figure 10: These two plots show the effects of changing the number of projected dimensions when using SimPoint. [sent-718, score-0.382]

84 The goal is to try to pick a small k so that the number of simulation points is also small, thereby reducing the simulation time required. [sent-745, score-0.37]

85 0 makes it much faster to find the best clustering and simulation points for a program trace over earlier versions. [sent-816, score-0.464]

86 This is shown in the bottom graph of Figure 14, where the exhaustive search picked 19 simulation points on average, and binary search chose 22 simulation points on average. [sent-832, score-0.398]

87 These results require an average simulation time of about 220 million instructions per program (for the binary search method). [sent-848, score-0.663]

88 In addition, the use of CPI data from one architecture configuration to form clusters would bind that clustering to that particular configuration. [sent-880, score-0.41]

89 A different architecture configuration which may produce different CPI values would not necessarily fit under the former clustering formation; thus the method is not architecture independent. [sent-881, score-0.482]

90 For example, SimPoint may tell the user to start detailed simulation when 5,000,000 instructions have executed, and stop just before 6,000,000 instructions have executed, using an interval size of 1,000,000 instructions. [sent-887, score-0.805]

91 If we can identify these behaviors and map them back to the source code level, then we could use the same phase analysis for a program compiled for different compiler optimizations or even architectures with different instruction sets. [sent-890, score-0.439]

92 To address this, we propose breaking the program’s execution up at procedure call and loop boundaries instead of breaking up the program’s execution using fixed length intervals. [sent-892, score-0.388]

93 , 2005a), we also showed that there is a clear hierarchy of phase behaviors, from fine-grained to coarse-grained depending upon the interval sizes used, and there is still future research to be done to determine how to pick the correct granularity for the target use of the phase analysis. [sent-908, score-0.374]

94 We found that multinomial clustering does not improve upon k-means clustering in terms of performance prediction, despite the fact that basic block vectors seem to be a natural fit to multinomials. [sent-914, score-0.422]

95 Following their work, we used up to 100 projected dimensions to find clustering results that work well for phase analysis with this approach. [sent-918, score-0.399]

96 One benefit is that multinomial clustering tends to choose fewer clusters on average (according to a BIC score formulated for multinomial mixtures), resulting in lower simulation times. [sent-926, score-0.522]

97 This purity score allows us to see if multinomial clustering is a good solution for a particular data set, and gives us a metric for deciding whether to apply multinomial clustering if the purity score is high enough, or k-means otherwise. [sent-930, score-0.448]

98 Summary Understanding the cycle level behavior of a processor running an application is crucial to modern computer architecture research, and gaining this understanding can be done efficiently by judiciously applying detailed cycle level simulation to only a few simulation points. [sent-933, score-0.681]

99 SimPoint automates the process of picking simulation points using an offline phase classification algorithm based on k-means clustering, which significantly reduces the amount of simulation time required. [sent-942, score-0.511]

100 A multinomial clustering model for fast simulation of computer architecture designs. [sent-1152, score-0.526]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('simpoint', 0.544), ('cpi', 0.35), ('instructions', 0.271), ('spec', 0.182), ('execution', 0.182), ('architecture', 0.172), ('simulation', 0.171), ('phase', 0.141), ('bic', 0.14), ('clustering', 0.138), ('intervals', 0.134), ('program', 0.127), ('sherwood', 0.105), ('clusters', 0.1), ('alder', 0.094), ('amerly', 0.094), ('erelman', 0.094), ('herwood', 0.094), ('million', 0.094), ('interval', 0.092), ('cluster', 0.091), ('imulation', 0.089), ('lau', 0.089), ('rchitecture', 0.089), ('uide', 0.089), ('perelman', 0.078), ('dimensions', 0.076), ('programs', 0.076), ('billion', 0.076), ('instruction', 0.075), ('ipc', 0.071), ('bbv', 0.068), ('sing', 0.068), ('achine', 0.068), ('code', 0.067), ('hamerly', 0.063), ('suite', 0.062), ('au', 0.061), ('phases', 0.06), ('gzip', 0.058), ('manhattan', 0.058), ('executed', 0.057), ('cache', 0.049), ('gcc', 0.047), ('behavior', 0.047), ('multinomial', 0.045), ('cycle', 0.045), ('fortran', 0.044), ('projected', 0.044), ('signatures', 0.044), ('simplescalar', 0.042), ('block', 0.041), ('benchmarks', 0.04), ('billions', 0.04), ('plot', 0.038), ('simulating', 0.037), ('metric', 0.036), ('benchmark', 0.036), ('miss', 0.035), ('distance', 0.033), ('simulate', 0.033), ('clustered', 0.032), ('architectural', 0.031), ('vectors', 0.031), ('processor', 0.03), ('basic', 0.029), ('similarity', 0.029), ('behaviors', 0.029), ('points', 0.028), ('seeds', 0.028), ('hardware', 0.028), ('int', 0.027), ('sanghai', 0.027), ('earning', 0.026), ('calder', 0.026), ('vlis', 0.026), ('warmup', 0.026), ('initializations', 0.025), ('usage', 0.025), ('frequency', 0.025), ('representative', 0.024), ('researchers', 0.024), ('length', 0.024), ('across', 0.024), ('executing', 0.024), ('score', 0.023), ('multinomials', 0.022), ('latency', 0.022), ('bzip', 0.022), ('iterations', 0.022), ('plots', 0.022), ('metrics', 0.022), ('gurations', 0.021), ('annavaram', 0.021), ('biesbrouck', 0.021), ('byte', 0.021), ('crafty', 0.021), ('parser', 0.021), ('register', 0.021), ('branch', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

Author: Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood

Abstract: An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately, simulating every cycle of a real program can take weeks or months. To address this problem we have created a tool called SimPoint that uses data clustering algorithms from machine learning to automatically find repetitive patterns in a program’s execution. By simulating one representative of each repetitive behavior pattern, simulation time can be reduced to minutes instead of weeks for standard benchmark programs, with very little cost in terms of accuracy. We describe this important problem, the data representation and preprocessing methods used by SimPoint, the clustering algorithm at the core of SimPoint, and we evaluate different options for tuning SimPoint. Keywords: k-means, random projection, Bayesian information criterion, simulation, SimPoint

2 0.068343952 78 jmlr-2006-Rearrangement Clustering: Pitfalls, Remedies, and Applications

Author: Sharlee Climer, Weixiong Zhang

Abstract: Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many fields over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The first pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It offers an extra benefit as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identifies better clusters than those found by previous approaches in all five cases. Overall, our results demonstrate the benefits of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites. Keywords: clustering, visualization of patterns in data, bond energy algorithm, traveling salesman problem, asymmetric clustering

3 0.065591022 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

4 0.055533677 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research     (Special Topic on Machine Learning and Optimization)

Author: Kristin P. Bennett, Emilio Parrado-Hernández

Abstract: The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, second-order cone, semidefinite, and semi-infinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to well-explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques. In turn, machine learning presents new challenges to mathematical programming. The special issue include papers from two primary themes: novel machine learning models and novel optimization approaches for existing models. Many papers blend both themes, making small changes in the underlying core mathematical program that enable the develop of effective new algorithms. Keywords: machine learning, mathematical programming, convex optimization

5 0.050177526 50 jmlr-2006-Learning Recursive Control Programs from Problem Solving     (Special Topic on Inductive Programming)

Author: Pat Langley, Dongkyu Choi

Abstract: In this paper, we propose a new representation for physical control – teleoreactive logic programs – along with an interpreter that uses them to achieve goals. In addition, we present a new learning method that acquires recursive forms of these structures from traces of successful problem solving. We report experiments in three different domains that demonstrate the generality of this approach. In closing, we review related work on learning complex skills and discuss directions for future research on this topic. Keywords: teleoreactive control, logic programs, problem solving, skill learning

6 0.035496816 88 jmlr-2006-Streamwise Feature Selection

7 0.033052634 12 jmlr-2006-Active Learning with Feedback on Features and Instances

8 0.032850366 43 jmlr-2006-Large Scale Multiple Kernel Learning     (Special Topic on Machine Learning and Optimization)

9 0.029458817 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

10 0.028993417 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

11 0.028922712 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

12 0.028671917 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting     (Special Topic on Inductive Programming)

13 0.028049266 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

14 0.026926523 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

15 0.026728263 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints     (Special Topic on Machine Learning and Optimization)

16 0.026086411 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

17 0.026044963 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution

18 0.025701689 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

19 0.025040161 33 jmlr-2006-Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems     (Special Topic on Machine Learning and Optimization)

20 0.023572145 49 jmlr-2006-Learning Parts-Based Representations of Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, -0.061), (2, -0.062), (3, 0.084), (4, -0.013), (5, -0.042), (6, -0.048), (7, -0.062), (8, -0.025), (9, 0.118), (10, 0.02), (11, -0.101), (12, -0.075), (13, 0.136), (14, 0.025), (15, 0.068), (16, -0.086), (17, 0.126), (18, 0.166), (19, 0.068), (20, 0.002), (21, 0.063), (22, 0.122), (23, 0.164), (24, 0.198), (25, -0.153), (26, 0.104), (27, -0.018), (28, -0.093), (29, -0.208), (30, -0.004), (31, -0.086), (32, -0.001), (33, 0.149), (34, 0.13), (35, 0.002), (36, 0.165), (37, -0.03), (38, -0.073), (39, -0.023), (40, -0.314), (41, 0.229), (42, 0.086), (43, -0.125), (44, -0.065), (45, 0.078), (46, 0.145), (47, 0.007), (48, -0.027), (49, -0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96603167 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

Author: Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood

Abstract: An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately, simulating every cycle of a real program can take weeks or months. To address this problem we have created a tool called SimPoint that uses data clustering algorithms from machine learning to automatically find repetitive patterns in a program’s execution. By simulating one representative of each repetitive behavior pattern, simulation time can be reduced to minutes instead of weeks for standard benchmark programs, with very little cost in terms of accuracy. We describe this important problem, the data representation and preprocessing methods used by SimPoint, the clustering algorithm at the core of SimPoint, and we evaluate different options for tuning SimPoint. Keywords: k-means, random projection, Bayesian information criterion, simulation, SimPoint

2 0.65480471 78 jmlr-2006-Rearrangement Clustering: Pitfalls, Remedies, and Applications

Author: Sharlee Climer, Weixiong Zhang

Abstract: Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many fields over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The first pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It offers an extra benefit as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identifies better clusters than those found by previous approaches in all five cases. Overall, our results demonstrate the benefits of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites. Keywords: clustering, visualization of patterns in data, bond energy algorithm, traveling salesman problem, asymmetric clustering

3 0.29682612 50 jmlr-2006-Learning Recursive Control Programs from Problem Solving     (Special Topic on Inductive Programming)

Author: Pat Langley, Dongkyu Choi

Abstract: In this paper, we propose a new representation for physical control – teleoreactive logic programs – along with an interpreter that uses them to achieve goals. In addition, we present a new learning method that acquires recursive forms of these structures from traces of successful problem solving. We report experiments in three different domains that demonstrate the generality of this approach. In closing, we review related work on learning complex skills and discuss directions for future research on this topic. Keywords: teleoreactive control, logic programs, problem solving, skill learning

4 0.23454277 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

5 0.20821051 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

6 0.18944624 88 jmlr-2006-Streamwise Feature Selection

7 0.18777779 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

8 0.15804887 31 jmlr-2006-Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization     (Special Topic on Machine Learning and Optimization)

9 0.15597895 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting     (Special Topic on Inductive Programming)

10 0.15099564 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research     (Special Topic on Machine Learning and Optimization)

11 0.14903983 43 jmlr-2006-Large Scale Multiple Kernel Learning     (Special Topic on Machine Learning and Optimization)

12 0.14650138 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

13 0.14552227 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

14 0.14302377 12 jmlr-2006-Active Learning with Feedback on Features and Instances

15 0.13313472 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

16 0.12972334 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints     (Special Topic on Machine Learning and Optimization)

17 0.12631026 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

18 0.12592381 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

19 0.11802438 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

20 0.1167886 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.028), (32, 0.369), (35, 0.015), (36, 0.084), (45, 0.021), (50, 0.057), (55, 0.022), (63, 0.049), (68, 0.018), (76, 0.023), (78, 0.021), (81, 0.033), (84, 0.015), (90, 0.017), (91, 0.02), (96, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79189044 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

Author: Ting Liu, Andrew W. Moore, Alexander Gray

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification

same-paper 2 0.71749777 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

Author: Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood

Abstract: An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately, simulating every cycle of a real program can take weeks or months. To address this problem we have created a tool called SimPoint that uses data clustering algorithms from machine learning to automatically find repetitive patterns in a program’s execution. By simulating one representative of each repetitive behavior pattern, simulation time can be reduced to minutes instead of weeks for standard benchmark programs, with very little cost in terms of accuracy. We describe this important problem, the data representation and preprocessing methods used by SimPoint, the clustering algorithm at the core of SimPoint, and we evaluate different options for tuning SimPoint. Keywords: k-means, random projection, Bayesian information criterion, simulation, SimPoint

3 0.38117105 44 jmlr-2006-Large Scale Transductive SVMs

Author: Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou

Abstract: We show how the concave-convex procedure can be applied to transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP

4 0.38014752 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

5 0.37697798 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

6 0.37293822 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

7 0.36779565 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs     (Special Topic on Machine Learning and Optimization)

8 0.36750963 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

9 0.36665919 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming     (Special Topic on Machine Learning and Optimization)

10 0.36484492 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

11 0.36464155 53 jmlr-2006-Learning a Hidden Hypergraph

12 0.36258543 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

13 0.36170202 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

14 0.36090285 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

15 0.35964257 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

16 0.35837469 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs    (Special Topic on Machine Learning and Optimization)

17 0.3571921 88 jmlr-2006-Streamwise Feature Selection

18 0.35703516 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

19 0.35622728 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

20 0.35380757 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting     (Special Topic on Inductive Programming)