jmlr jmlr2013 jmlr2013-89 knowledge-graph by maker-knowledge-mining

89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules


Source: pdf

Author: Ansaf Salleb-Aouissi, Christel Vrain, Cyril Nortet, Xiangrong Kong, Vivek Rathod, Daniel Cassard

Abstract: In this paper, we propose Q UANT M INER, a mining quantitative association rules system. This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. The experiments on real and artificial databases have shown the usefulness of Q UANT M INER as an interactive, exploratory data mining tool. Keywords: association rules, numerical and categorical attributes, unsupervised discretization, genetic algorithm, simulated annealing

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 FR Laboratoire d’Informatique Fondamentale d’Orl´ ans (LIFO) e Universit´ d’Orl´ ans e e BP 6759, 45067 Orl´ ans Cedex 2, France e Xiangrong Kong Vivek Rathod SHARONXKONG @ GMAIL . [sent-7, score-0.468]

2 FR French Geological Survey (BRGM) 3, avenue Claude Guillemin BP 6009, Orl´ ans Cedex 2, France e Editor: Mikio Braun Abstract In this paper, we propose Q UANT M INER, a mining quantitative association rules system. [sent-11, score-0.762]

3 This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. [sent-12, score-0.557]

4 The experiments on real and artificial databases have shown the usefulness of Q UANT M INER as an interactive, exploratory data mining tool. [sent-13, score-0.172]

5 Keywords: association rules, numerical and categorical attributes, unsupervised discretization, genetic algorithm, simulated annealing 1. [sent-14, score-0.505]

6 Introduction In this paper, we propose a software for mining quantitative and categorical rules, that implements the work proposed in Salleb-Aouissi et al. [sent-15, score-0.52]

7 Given a set of categorical and quantitative attributes and a data set, the aim is to find rules built on these attributes that optimize given criteria. [sent-17, score-1.23]

8 Expressions occurring in the rules are either A = v for a categorical attribute, or A ∈ [l, u] for a quantitative one. [sent-18, score-0.551]

9 Mining quantitative association rules cannot be considered as a direct extension of mining categorical rules. [sent-20, score-0.799]

10 While this task has received less attention than mining Boolean association rules (Agrawal et al. [sent-21, score-0.451]

11 S ALLEB -AOUISSI ,V RAIN , N ORTET, KONG , R ATHOD AND C ASSARD numeric attributes into intervals is performed before the mining task in Srikant and Agrawal (1996). [sent-25, score-0.952]

12 , Aumann and Lindell, 1999) restrict learning to special kind of rules: the right-hand side of a rule expresses the distribution (e. [sent-28, score-0.18]

13 , mean, variance) of numeric attributes, the left-hand side is composed either of a set of categorical attributes or of a single discretized numeric attribute. [sent-30, score-1.216]

14 Optimization-based approaches handle numeric attributes during the mining process. [sent-31, score-0.851]

15 (2003) but the forms of the rules remain restricted to one or two numeric attributes. [sent-35, score-0.469]

16 A genetic algorithm is also proposed in Mata et al. [sent-36, score-0.146]

17 (2002) to optimize the support of itemsets defined on uninstantiated intervals of numeric attributes. [sent-37, score-0.475]

18 This approach is limited to numeric attributes and optimizes only the support before mining association rules. [sent-38, score-1.021]

19 QuantMiner handles both categorical and numerical attributes and optimizes the intervals of numeric attributes during the process of mining association rules. [sent-39, score-1.679]

20 It is based on a genetic algorithm, the fitness function aims at maximizing the gain of an association rule while penalizing the attributes with large intervals. [sent-40, score-0.882]

21 , 2010) show the interest of evolutionary algorithms for such a task. [sent-44, score-0.048]

22 QuantMiner In the following, an item is either an expression A = v, where A is a categorical (also called qualitative) attribute and v is a value from its domain, or an expression A ∈ [l, u] where A is a numerical (also called quantitative) attribute. [sent-46, score-0.387]

23 QuantMiner optimizes a set of rule patterns produced from a user-specified rule template. [sent-47, score-0.347]

24 Rule templates: A rule template is a preset format of a quantitative association rule used as a starting point for the quantitative mining process. [sent-48, score-1.113]

25 It is defined by the set of attributes occurring in the left hand side and the right hand side or both sides of the rule. [sent-49, score-0.471]

26 Categorical attributes may or may not have specific values. [sent-50, score-0.364]

27 Furthermore, an attribute may be mandatory or optional, thus allowing to generate rules of different lengths from the same rule template. [sent-51, score-0.527]

28 Given a rule template, first for each unspecified categorical attribute in the template, the frequent values are computed. [sent-52, score-0.57]

29 Then, a set of rule patterns verifying the specifications (position of the attributes, mandatory/optional presence of the attributes, values for categorical attributes either provided, or computed as frequent) are built. [sent-53, score-0.708]

30 For each rule pattern, the algorithm looks for the “best” intervals for the numeric attributes occurring in that template, relying on a genetic algorithm. [sent-54, score-1.126]

31 1 An example of rule template and a specific example of rule are given in Figure 1 and in Figure 2 respectively. [sent-56, score-0.454]

32 Population: An individual is a set of items of the form attributei ∈ [li , ui ], where attributei is the ith numeric attribute in the rule template from the left to the right. [sent-57, score-0.992]

33 The process for generating the population is described in Salleb-Aouissi et al. [sent-58, score-0.027]

34 Genetic operators: Mutation and crossover are both used in QuantMiner. [sent-60, score-0.062]

35 For the crossover operator, for each attribute the interval is either inherited from one of the parents or formed by mixing the bounds of the two parents. [sent-61, score-0.308]

36 For an individual, mutation increases or decreases the lower or upper bound of its intervals. [sent-62, score-0.052]

37 Moving interval bounds is done so as to discard/involve no more than 10% of tuples already covered by the interval. [sent-63, score-0.052]

38 3154 Q UANT M INER Figure 1: An example of rule template exploring petal attributes for 2 categories of iris. [sent-69, score-0.79]

39 petal length and petal width are chosen to be on the right side of the rules template. [sent-70, score-0.371]

40 (1) Let Anum the set of numerical attributes present in the rule A ⇒ B. [sent-73, score-0.515]

41 Let Ia denote the interval of the attribute a ∈ Anum . [sent-74, score-0.246]

42 In the following, size(a) denotes the length of the smallest interval which contains all the data for the attribute a and size(Ia ) denotes the length of the interval Ia . [sent-75, score-0.298]

43 If Gain(A ⇒ B) is negative, then the fitness of the rule is set to the gain. [sent-76, score-0.151]

44 If it is positive (the confidence of the rule exceeds the minimum confidence threshold), the proportions of the intervals (defined as the ratios between the sizes and the domains) is taken into account, so as to favor those with small sizes as shown in Equation 2. [sent-77, score-0.252]

45 Moreover, rules with low supports are penalized by decreasing drastically their fitness values. [sent-78, score-0.154]

46 Implementation We developed Q UANT M INER in JAVA as a 5-step GUI wizard allowing an interactive mining process. [sent-81, score-0.215]

47 2 After opening a data set, the user can choose attributes, a rule template, the optimization technique and set its parameters, launch the process, and finally display the rules with various sorting options: support, confidence or rule length. [sent-82, score-0.482]

48 3155 S ALLEB -AOUISSI ,V RAIN , N ORTET, KONG , R ATHOD AND C ASSARD Figure 2: Example of rule visualization in QuantMiner. [sent-87, score-0.151]

49 The top part shows filtering criteria that facilitate exploring the rules. [sent-88, score-0.029]

50 The bottom part shows a specific rule followed by the proportion of each interval in its corresponding domain. [sent-89, score-0.203]

51 More measures are given to assess the quality of the rule, for example, con f idence(¬A ⇒ B). [sent-90, score-0.028]

52 previous steps, change the method, parameters, templates and restart the learning. [sent-91, score-0.08]

53 Note that simulated annealing is implemented in QuantMiner as an alternative optimization method. [sent-92, score-0.041]

54 A tentative for mining rules with disjunctive intervals is also implemented. [sent-93, score-0.455]

55 Acknowledgments This project has been supported by the BRGM-French Geological Survey, LIFO-Universit´ d’Orl´ ans e e and CCLS-Columbia University. [sent-95, score-0.156]

56 Many thanks to everyone who contributed to QuantMiner with support, ideas, data and feedback. [sent-96, score-0.031]

57 Mining association rules between sets of items in large databases. [sent-102, score-0.313]

58 Analysis of the effectiveness of the a e genetic algorithms based on extraction of association rules. [sent-109, score-0.271]

59 Quantminer: A genetic algorithm for mining quantitative association rules. [sent-148, score-0.598]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('attributes', 0.364), ('numeric', 0.315), ('quantminer', 0.255), ('attribute', 0.194), ('categorical', 0.193), ('iner', 0.182), ('orl', 0.182), ('uant', 0.182), ('mining', 0.172), ('ans', 0.156), ('quantitative', 0.155), ('rules', 0.154), ('template', 0.152), ('rule', 0.151), ('genetic', 0.146), ('association', 0.125), ('ia', 0.125), ('anum', 0.109), ('ccls', 0.109), ('fukuda', 0.109), ('ortet', 0.109), ('vrain', 0.109), ('intervals', 0.101), ('kong', 0.097), ('gain', 0.096), ('petal', 0.094), ('rain', 0.094), ('ansaf', 0.084), ('alcal', 0.073), ('alleb', 0.073), ('assard', 0.073), ('athod', 0.073), ('attributei', 0.073), ('aumann', 0.073), ('brgm', 0.073), ('brin', 0.073), ('cassard', 0.073), ('christel', 0.073), ('cyril', 0.073), ('fitness', 0.073), ('geological', 0.073), ('mata', 0.073), ('nortet', 0.073), ('rathod', 0.073), ('vivek', 0.073), ('xiangrong', 0.073), ('tness', 0.073), ('agrawal', 0.065), ('columbia', 0.065), ('orleans', 0.062), ('srikant', 0.062), ('riverside', 0.062), ('crossover', 0.062), ('templates', 0.052), ('sigmod', 0.052), ('mutation', 0.052), ('interval', 0.052), ('occurring', 0.049), ('bp', 0.048), ('cedex', 0.048), ('evolutionary', 0.048), ('optimizes', 0.045), ('interactive', 0.043), ('annealing', 0.041), ('dence', 0.04), ('fr', 0.04), ('daniel', 0.039), ('supp', 0.038), ('drive', 0.034), ('gmail', 0.034), ('items', 0.034), ('frequent', 0.032), ('mikio', 0.031), ('discovers', 0.031), ('everyone', 0.031), ('uninstantiated', 0.031), ('alvarez', 0.031), ('claude', 0.031), ('binning', 0.031), ('gui', 0.031), ('france', 0.031), ('exploring', 0.029), ('side', 0.029), ('con', 0.028), ('braun', 0.028), ('pods', 0.028), ('laboratoire', 0.028), ('preset', 0.028), ('disjunctive', 0.028), ('itemsets', 0.028), ('mandatory', 0.028), ('optional', 0.028), ('restart', 0.028), ('acm', 0.028), ('population', 0.027), ('functionality', 0.026), ('opening', 0.026), ('unspeci', 0.026), ('relational', 0.024), ('format', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

Author: Ansaf Salleb-Aouissi, Christel Vrain, Cyril Nortet, Xiangrong Kong, Vivek Rathod, Daniel Cassard

Abstract: In this paper, we propose Q UANT M INER, a mining quantitative association rules system. This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. The experiments on real and artificial databases have shown the usefulness of Q UANT M INER as an interactive, exploratory data mining tool. Keywords: association rules, numerical and categorical attributes, unsupervised discretization, genetic algorithm, simulated annealing

2 0.12692004 42 jmlr-2013-Fast Generalized Subset Scan for Anomalous Pattern Detection

Author: Edward McFowland III, Skyler Speakman, Daniel B. Neill

Abstract: We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets. Keywords: pattern detection, anomaly detection, knowledge discovery, Bayesian networks, scan statistics

3 0.086174041 12 jmlr-2013-Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting

Author: Nayyar A. Zaidi, Jesús Cerquides, Mark J. Carman, Geoffrey I. Webb

Abstract: Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art classifiers like Random Forest, Logistic Regression and A1DE. Keywords: classification, naive Bayes, attribute independence assumption, weighted naive Bayes classification

4 0.068960376 61 jmlr-2013-Learning Theory Analysis for Association Rules and Sequential Event Prediction

Author: Cynthia Rudin, Benjamin Letham, David Madigan

Abstract: We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called “sequential event prediction.” In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer’s online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the “cold start” problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis. Keywords: statistical learning theory, algorithmic stability, association rules, sequence prediction, associative classification c 2013 Cynthia Rudin, Benjamin Letham and David Madigan. RUDIN , L E

5 0.034244105 95 jmlr-2013-Ranking Forests

Author: Stéphan Clémençon, Marine Depecker, Nicolas Vayatis

Abstract: The present paper examines how the aggregation and feature randomization principles underlying the algorithm R ANDOM F OREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Cl´ mencon and Vayatis (2009c). The present work e ¸ describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called R ANK ING F OREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets. Keywords: bipartite ranking, nonparametric scoring, classification data, ROC optimization, AUC criterion, tree-based ranking rules, bootstrap, bagging, rank aggregation, median ranking, feature randomization

6 0.026915144 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

7 0.023896327 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

8 0.022952773 83 jmlr-2013-Orange: Data Mining Toolbox in Python

9 0.020296123 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

10 0.019899946 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

11 0.019644029 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

12 0.018706897 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

13 0.018195007 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

14 0.017862262 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

15 0.016313061 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines

16 0.015940866 120 jmlr-2013-Variational Algorithms for Marginal MAP

17 0.014795123 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

18 0.014002223 22 jmlr-2013-Classifying With Confidence From Incomplete Information

19 0.013217756 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

20 0.013162604 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.065), (1, 0.006), (2, -0.025), (3, 0.025), (4, 0.015), (5, 0.038), (6, 0.004), (7, 0.038), (8, -0.04), (9, 0.005), (10, 0.007), (11, -0.231), (12, 0.018), (13, -0.315), (14, 0.05), (15, -0.073), (16, -0.404), (17, 0.083), (18, -0.026), (19, -0.265), (20, -0.129), (21, 0.086), (22, 0.007), (23, -0.079), (24, -0.029), (25, 0.036), (26, -0.089), (27, -0.09), (28, 0.016), (29, -0.027), (30, -0.058), (31, 0.075), (32, 0.109), (33, -0.038), (34, 0.061), (35, 0.039), (36, -0.04), (37, 0.021), (38, 0.024), (39, -0.093), (40, -0.023), (41, -0.002), (42, -0.083), (43, -0.044), (44, -0.024), (45, -0.058), (46, 0.009), (47, -0.004), (48, -0.03), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9863807 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

Author: Ansaf Salleb-Aouissi, Christel Vrain, Cyril Nortet, Xiangrong Kong, Vivek Rathod, Daniel Cassard

Abstract: In this paper, we propose Q UANT M INER, a mining quantitative association rules system. This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. The experiments on real and artificial databases have shown the usefulness of Q UANT M INER as an interactive, exploratory data mining tool. Keywords: association rules, numerical and categorical attributes, unsupervised discretization, genetic algorithm, simulated annealing

2 0.79997838 42 jmlr-2013-Fast Generalized Subset Scan for Anomalous Pattern Detection

Author: Edward McFowland III, Skyler Speakman, Daniel B. Neill

Abstract: We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets. Keywords: pattern detection, anomaly detection, knowledge discovery, Bayesian networks, scan statistics

3 0.46372253 12 jmlr-2013-Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting

Author: Nayyar A. Zaidi, Jesús Cerquides, Mark J. Carman, Geoffrey I. Webb

Abstract: Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art classifiers like Random Forest, Logistic Regression and A1DE. Keywords: classification, naive Bayes, attribute independence assumption, weighted naive Bayes classification

4 0.41894236 61 jmlr-2013-Learning Theory Analysis for Association Rules and Sequential Event Prediction

Author: Cynthia Rudin, Benjamin Letham, David Madigan

Abstract: We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called “sequential event prediction.” In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer’s online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the “cold start” problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis. Keywords: statistical learning theory, algorithmic stability, association rules, sequence prediction, associative classification c 2013 Cynthia Rudin, Benjamin Letham and David Madigan. RUDIN , L E

5 0.26431078 95 jmlr-2013-Ranking Forests

Author: Stéphan Clémençon, Marine Depecker, Nicolas Vayatis

Abstract: The present paper examines how the aggregation and feature randomization principles underlying the algorithm R ANDOM F OREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Cl´ mencon and Vayatis (2009c). The present work e ¸ describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called R ANK ING F OREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets. Keywords: bipartite ranking, nonparametric scoring, classification data, ROC optimization, AUC criterion, tree-based ranking rules, bootstrap, bagging, rank aggregation, median ranking, feature randomization

6 0.18384725 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

7 0.15479365 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

8 0.14905542 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

9 0.13469294 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

10 0.12257242 83 jmlr-2013-Orange: Data Mining Toolbox in Python

11 0.11963557 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

12 0.11035998 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming

13 0.10853873 24 jmlr-2013-Comment on "Robustness and Regularization of Support Vector Machines" by H. Xu et al. (Journal of Machine Learning Research, vol. 10, pp. 1485-1510, 2009)

14 0.10495848 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

15 0.097056836 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

16 0.088246919 22 jmlr-2013-Classifying With Confidence From Incomplete Information

17 0.084998608 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

18 0.084196843 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

19 0.083874576 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

20 0.082905874 37 jmlr-2013-Divvy: Fast and Intuitive Exploratory Data Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.048), (10, 0.018), (20, 0.783), (23, 0.011), (70, 0.014), (75, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95137268 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

Author: Ansaf Salleb-Aouissi, Christel Vrain, Cyril Nortet, Xiangrong Kong, Vivek Rathod, Daniel Cassard

Abstract: In this paper, we propose Q UANT M INER, a mining quantitative association rules system. This system is based on a genetic algorithm that dynamically discovers “good” intervals in association rules by optimizing both the support and the confidence. The experiments on real and artificial databases have shown the usefulness of Q UANT M INER as an interactive, exploratory data mining tool. Keywords: association rules, numerical and categorical attributes, unsupervised discretization, genetic algorithm, simulated annealing

2 0.73985153 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

3 0.51975846 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

Author: Sohail Bahmani, Bhiksha Raj, Petros T. Boufounos

Abstract: Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and Compressed Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsityconstrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressed Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic and real data, where the algorithm is employed for sparse logistic regression with and without ℓ2 -regularization. Keywords: sparsity, optimization, compressed sensing, greedy algorithm

4 0.22412978 61 jmlr-2013-Learning Theory Analysis for Association Rules and Sequential Event Prediction

Author: Cynthia Rudin, Benjamin Letham, David Madigan

Abstract: We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called “sequential event prediction.” In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer’s online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the “cold start” problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis. Keywords: statistical learning theory, algorithmic stability, association rules, sequence prediction, associative classification c 2013 Cynthia Rudin, Benjamin Letham and David Madigan. RUDIN , L E

5 0.21841209 80 jmlr-2013-One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Author: Jun Wan, Qiuqi Ruan, Wei Li, Shuang Deng

Abstract: For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2). Keywords: gesture recognition, bag of features (BoF) model, one-shot learning, 3D enhanced motion scale invariant feature transform (3D EMoSIFT), Simulation Orthogonal Matching Pursuit (SOMP)

6 0.2077805 42 jmlr-2013-Fast Generalized Subset Scan for Anomalous Pattern Detection

7 0.18499027 95 jmlr-2013-Ranking Forests

8 0.18455128 66 jmlr-2013-MAGIC Summoning: Towards Automatic Suggesting and Testing of Gestures With Low Probability of False Positives During Use

9 0.1662394 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

10 0.16435856 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

11 0.15782507 82 jmlr-2013-Optimally Fuzzy Temporal Memory

12 0.14956991 33 jmlr-2013-Dimension Independent Similarity Computation

13 0.14645861 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

14 0.14285004 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

15 0.14144194 32 jmlr-2013-Differential Privacy for Functions and Functional Data

16 0.14140697 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

17 0.14117102 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations

18 0.14076252 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

19 0.13958958 22 jmlr-2013-Classifying With Confidence From Incomplete Information

20 0.1390039 59 jmlr-2013-Large-scale SVD and Manifold Learning