jmlr jmlr2006 jmlr2006-68 knowledge-graph by maker-knowledge-mining

68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies


Source: pdf

Author: Michael Schmitt, Laura Martignon

Abstract: Fast and frugal heuristics are well studied models of bounded rationality. Psychological research has proposed the take-the-best heuristic as a successful strategy in decision making with limited resources. Take-the-best searches for a sufficiently good ordering of cues (or features) in a task where objects are to be compared lexicographically. We investigate the computational complexity of finding optimal cue permutations for lexicographic strategies and prove that the problem is NP-complete. It follows that no efficient (that is, polynomial-time) algorithm computes optimal solutions, unless P = NP. We further analyze the complexity of approximating optimal cue permutations for lexicographic strategies. We show that there is no efficient algorithm that approximates the optimum to within any constant factor, unless P = NP. The results have implications for the complexity of learning lexicographic strategies from examples. They show that learning them in polynomial time within the model of agnostic probably approximately correct (PAC) learning is impossible, unless RP = NP. We further consider greedy approaches for building lexicographic strategies and determine upper and lower bounds for the performance ratio of simple algorithms. Moreover, we present a greedy algorithm that performs provably better than take-the-best. Tight bounds on the sample complexity for learning lexicographic strategies are also given in this article. Keywords: bounded rationality, fast and frugal heuristic, PAC learning, NP-completeness, hardness of approximation, greedy method

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We investigate the computational complexity of finding optimal cue permutations for lexicographic strategies and prove that the problem is NP-complete. [sent-6, score-1.129]

2 We further analyze the complexity of approximating optimal cue permutations for lexicographic strategies. [sent-8, score-1.072]

3 TTB compares the cues one after the other and uses the first cue that discriminates as the one reason to yield the final decision. [sent-39, score-0.833]

4 If the cue values of both cities are equal, the algorithm passes on to the next cue. [sent-42, score-0.629]

5 ) The validity of a cue is a real number in the interval [0, 1] that is computed in terms of the known outcomes of paired comparisons. [sent-46, score-0.627]

6 It is defined as the number of pairs the cue discriminates correctly (i. [sent-47, score-0.66]

7 TTB always chooses a cue with the highest validity, that is, it “takes the best” among those cues not yet considered. [sent-52, score-0.81]

8 Table 1 gives an example showing cue profiles and validities for three cities. [sent-53, score-0.638]

9 As an example for calculating the validity, the state-capital cue distinguishes the first and the third pair but is correct only on the latter. [sent-63, score-0.657]

10 In the example of D¨ sseldorf and Hamburg, the car-license-plate cue would yield that D¨ sseldorf (represented by u u the letter “D”) is larger than Hamburg (represented by the two letters “HH”), whereas the soccerteam cue would favor Hamburg, which is correct. [sent-66, score-1.266]

11 Thus, how successful a lexicographic strategy is in a comparison task consisting of a partial ordering of cue profiles depends on how well the cue ranking minimizes the number of incorrect comparisons. [sent-67, score-1.866]

12 Specifically, the accuracy of TTB relies on the degree of optimality achieved by the ranking according to decreasing cue validities. [sent-68, score-0.635]

13 Given a list of ordered pairs, it computes all cue validities in a number of computing steps that is linear in the size of the list, 2. [sent-79, score-0.665]

14 The recognition cue is always queried first and, hence, not relevant for the problem of finding optimal cue permutations considered here. [sent-83, score-1.269]

15 It follows that TTB is not an algorithm for computing optimal cue permutations and, even more, that no polynomialtime algorithm exists for solving this task, unless the complexity classes P and NP are equal. [sent-89, score-0.685]

16 The fact that finding optimal cue permutations turns out to be practically intractable, however, does not exclude the possibility that the optimum can be efficiently approximated. [sent-90, score-0.668]

17 The second main topic of this article is an optimization problem called M INIMUM I NCORRECT L EXICOGRAPHIC S TRATEGY denoting the task of minimizing the number of incorrect inferences for the lexicographic strategy on a given list of pairs. [sent-91, score-0.75]

18 Moreover, we show that TTB does not have this property, concluding that the greedy method of constructing cue permutations performs provably better than TTB. [sent-102, score-0.696]

19 While the results mentioned so far deal with lexicographic strategies based on cue permutations, we further consider the possibility to build them by also inverting cues. [sent-103, score-1.062]

20 We present an algorithm that greedily constructs cue inversions that are always correct on a number of pairs that is at least half the optimum. [sent-104, score-0.71]

21 Given a set of pairs, the question is whether a cue permutation can be found that keeps the number of incorrect comparisons, or disagreements, of the lexicographic strategy below some prescribed value. [sent-109, score-1.329]

22 Adopting the framework of agnostic PAC learning, we assume that pairs of cue profiles are drawn randomly according to some unknown distribution. [sent-122, score-0.669]

23 The task of the learner is to find a cue permutation that, with high probability, is close to an optimal one, where closeness means that the probability of differing inferences is small. [sent-123, score-0.845]

24 In detail, we show that the class of lexicographic strategies obtained by cue permutations and inversions has a VC dimension equal to the number of cues. [sent-134, score-1.172]

25 On the other hand, an algorithm that finds optimal cue permutations might not be good in constructing 2-decision lists. [sent-148, score-0.668]

26 We shall show later that the problem of finding cue permutations for the lexicographic strategy can be formulated as such an ordering problem. [sent-153, score-1.15]

27 However, we shall also argue that the two problems are different, since the cue permutation problem requires the total order to be implemented as a lexicographic strategy and not every total order can be represented this way. [sent-154, score-1.215]

28 We then draw comparisons with decision lists and discuss the relationship of the problem of finding optimal cue permutations with the ordering problem studied by Cohen et al. [sent-157, score-0.751]

29 We obtain that the problem remains NP-complete under constraints that require the cue profiles to be sparse, impose a bound on the number of pairs, or suppose the pairs to satisfy some simple properties of orderings. [sent-161, score-0.637]

30 Section 5 introduces the greedy algorithm for constructing cue permutations. [sent-167, score-0.629]

31 The result implies that the greedy method always finds a correct cue permutation if one exists. [sent-169, score-0.792]

32 We determine the number of cues as the exact value for the VC dimension of the class of lexicographic strategies obtained from cue permutations and inversions. [sent-175, score-1.338]

33 , bn ), the lexicographic strategy searches for the smallest cue index i ∈ {1, . [sent-202, score-1.048]

34 If no such cue exists, the strategy returns “ = ”. [sent-207, score-0.644]

35 The lexicographic strategy under cue permutation π passes through the cues in the order π(1), . [sent-233, score-1.39]

36 61 S CHMITT AND M ARTIGNON The problem we study is that of finding a cue permutation that minimizes the number of incorrect comparisons in a given list of element pairs using the lexicographic strategy. [sent-237, score-1.39]

37 Given a cue permutation π, we say that the lexicographic strategy under π infers the pair a, b correctly if Sπ (a, b) ∈ {“ < ”, “ = ”}, otherwise the inference is incorrect. [sent-240, score-1.207]

38 Given some cue permutation π, consider a relation that is satisfied by a pair a, b if and only if Sπ (a, b) ∈ {“ < ”, “ = ”}. [sent-246, score-0.76]

39 A question that arises immediately is whether every total order has some cue permutation that represents this order using the lexicographic strategy. [sent-248, score-1.155]

40 Proposition 1 For every set B ⊆ {0, 1}n and every cue permutation π, the lexicographic strategy under cue permutation π defines a total order on B. [sent-250, score-1.932]

41 Obviously, the lexicographic strategy applied to a pair a, a is always correct, independently of the cue permutation. [sent-268, score-1.074]

42 Thus, finding a cue permutation for the lexicographic strategy amounts to constructing a 2-decision list with some restrictions concerning the structure of the monomials, the pattern of the output values, and the length of the list. [sent-304, score-1.235]

43 We conclude that cue permutations are not necessarily found using algorithms for constructing 2-decision lists. [sent-306, score-0.668]

44 Further, an optimal cue permutation might not be an optimal 2-decision list. [sent-307, score-0.734]

45 It is not hard to see that the instances of the cue permutation problem are particular instances of the above problem. [sent-321, score-0.776]

46 The question is, therefore, whether this hardness result has any implications on the complexity of finding a cue permutation that minimizes the number of incorrect inferences. [sent-327, score-0.882]

47 However, the ranking problem is different from the cue permutation problem not only in that its instances are more general. [sent-328, score-0.789]

48 While the ranking problem accepts any total order that maximizes the agreement with the preference function, the cue permutation problem requires that the total order can be implemented by a lexicographic strategy. [sent-330, score-1.222]

49 Thus, the space taken by the solutions of the cue permutation problem is narrower than the solution space for the ranking problem described above. [sent-332, score-0.768]

50 Moreover, we show in Section 3 that the cue permutation problem remains NP-complete even when the instances are known to have a total order. [sent-333, score-0.772]

51 Then the cue permutation problem is a minimization problem while the ranking problem is a maximization problem. [sent-336, score-0.768]

52 Consequently, despite the apparent similarity of the cue permutation problem and the ranking problem, the complexities of the two problems are obviously not related. [sent-340, score-0.768]

53 Complexity of Finding Optimal Cue Permutations We consider the complexity of the problem to minimize the number of incorrect inferences under the lexicographic strategy. [sent-342, score-0.663]

54 The question is to decide whether the cues can be permuted such that the number of incorrect inferences made by the lexicographic strategy when applied with this cue permutation to the list of pairs is not larger than the given bound. [sent-345, score-1.712]

55 Question: Is there a permutation of the cues of B such that the number of incorrect inferences in L under the lexicographic strategy is at most k? [sent-348, score-1.048]

56 We establish the correctness of the reduction by proving that the graph G has a vertex cover of cardinality at most k if and only if the associated instance of L EXICOGRAPHIC S TRATEGY has a cue permutation that results in no more than k incorrect inferences. [sent-380, score-1.001]

57 We claim that this cue ranking causes no more than k incorrect inferences in L. [sent-397, score-0.894]

58 This implies that the first cue that distinguishes this pair will have value 0 in (1i, j , 1) and value 1 in (1, 0). [sent-403, score-0.627]

59 In this case, cue n + 1 distinguishes this pair with the correct outcome. [sent-406, score-0.657]

60 Finally, each vertex pair (1, 0), (1i , 1) with vi ∈ V is distinguished by cue i with a result different from 65 S CHMITT AND M ARTIGNON the ordering given by L. [sent-407, score-0.731]

61 The fact that the edge pair is inferred correctly implies that π must rank cue i or j before cue n + 1. [sent-419, score-1.228]

62 Finally, the existence of such a total ordering is equivalent to the assertion that B has a cue permutation with no incorrect comparisons in L = { 1i , 1 j : (vi , v j ) ∈ A \ A }. [sent-466, score-0.957]

63 Corollary 6 The problem of finding a cue permutation with a minimal number of incorrect comparisons under the lexicographic strategy is solvable in linear time for instances where B contains at most one 0 and L is a subset of some partial order. [sent-471, score-1.373]

64 Approximability of Optimal Cue Permutations In the previous section, we have shown that there is no polynomial-time algorithm that computes optimal cue permutations for the lexicographic strategy, unless P = NP. [sent-479, score-1.089]

65 In this section, we show that the problem of approximating the optimal cue permutation is harder than any problem in the class APX. [sent-484, score-0.734]

66 Measure: The number of incorrect inferences in L for the lexicographic strategy under cue permutation π, that is, INCORRECT(π, L). [sent-490, score-1.44]

67 We make use of this fact when we establish the lower bound for the approximability of the optimal cue permutation. [sent-505, score-0.667]

68 If C has a hitting set of cardinality k or less then f (C) has a cue permutation π where INCORRECT(π, L) ≤ k. [sent-557, score-0.832]

69 Hence, the first cue that distinguishes this pair has value 0 in (1i1 ,. [sent-584, score-0.627]

70 This pair is distinguished correctly by cue n + 1. [sent-589, score-0.627]

71 Finally, each element pair (1, 0), (1i , 1) with ui ∈ U is distinguished by cue i with a result that disagrees with the ordering given by L. [sent-590, score-0.704]

72 Next, we define a polynomial-time computable function g that maps each collection C of subsets of a finite set U and each cue permutation π for f (C) to a subset of U. [sent-593, score-0.734]

73 After that the algorithm removes those pairs that are distinguished by the selected cue, which is reasonable as the distinctions drawn by this cue cannot be undone by later cues. [sent-672, score-0.637]

74 In particular, it always finds a cue permutation with no incorrect inferences if one exists. [sent-682, score-0.993]

75 If n = 1, the optimal cue permutation is definitely found. [sent-684, score-0.734]

76 Clearly, as the incorrect inferences of a cue cannot be reversed by other cues, there is a cue j with INCORRECT( j, L) ≤ opt(L). [sent-686, score-1.461]

77 By the induction hypothesis, rounds 2 to n of the loop determine a cue permutation π with INCORRECT(π , L ) ≤ (n − 1) · optI (L ). [sent-701, score-0.734]

78 72 L EARNING L EXICOGRAPHIC S TRATEGIES 001 , 010 010 , 100 010 , 101 100 , 111 Figure 1: A set of lexicographically ordered pairs with nondecreasing cue validities (1, 1/2, and 2/3). [sent-702, score-0.674]

79 The cue ordering of TTB (1, 3, 2) causes an incorrect inference on the first pair. [sent-703, score-0.784]

80 Corollary 10 On inputs that have a cue ordering without incorrect comparisons under the lexicographic strategy, G REEDY C UE P ERMUTATION can be better than TTB. [sent-707, score-1.211]

81 As can be seen, cue 1 is correct on all pairs, cue n is incorrect on two pairs, and every cue j ∈ {2, . [sent-735, score-1.981]

82 The resulting permutation π has cue 1 in its first position, cues from {2, . [sent-754, score-0.943]

83 On the other hand, the optimal value is 2, which is attained by any permutation that has cue n as the first cue. [sent-762, score-0.734]

84 Lexicographic Strategies With Cue Inversion While in the previous sections the problem was to optimize lexicographic strategies by permuting the cues, we now introduce an additional degree of freedom for building lexicographic strategies. [sent-778, score-0.865]

85 A cue 74 L EARNING L EXICOGRAPHIC S TRATEGIES Algorithm 2 G REEDY C UE I NVERSION Input: a set B ⊆ {0, 1}n and a set L ⊆ B × B Output: a cue inversion q for n cues for i = 1, . [sent-780, score-1.453]

86 Given a set B ⊆ {0, 1}n , the lexicographic strategy under cue inversion q is the function Sq : B × B → {“ < ”, “ = ”, “ > ”} with Sq (a, b) = S(q(a), q(b)). [sent-791, score-1.09]

87 Combining permutation and inversion, we obtain the lexicographic strategy under cue permutation q π and cue inversion q denoted by Sπ and defined as Sπ (a, b) = S(π(q(a)), π(q(b))). [sent-792, score-1.957]

88 q In particular, we require that the cue inversion is applied before the permutation. [sent-793, score-0.643]

89 The idea is to pass through the cues and to select either the cue or its inverse, depending on which makes a larger number of correct inferences. [sent-795, score-0.84]

90 The pairs that are distinguished by this cue are then removed. [sent-796, score-0.637]

91 We show that the cue inversion returned by this algorithm yields a number of correct inferences that is at least half the maximum over all cue inversions and permutations. [sent-798, score-1.428]

92 Theorem 13 The algorithm G REEDY C UE I NVERSION always returns a cue inversion q such that Sq is correct on at least opt(L)/2 pairs, where opt(L) is the maximum number of correct inferences achievable by the lexicographic strategy under any cue permutation and any cue inversion. [sent-799, score-2.596]

93 It seems, at first glance, that the method of cue inversion leads much easier to a good performance guarantee than the permutation of the cues. [sent-813, score-0.776]

94 Given a cue inversion q, consider the lexicographic q strategy Sq ∈ Sn (that is, the strategy Sπ where π is the identity function). [sent-834, score-1.133]

95 The lower bound in the previous result was obtained by choosing a suitable cue inversion and leaving the order of the cues unchanged. [sent-866, score-0.852]

96 In Section 6 we have introduced cue inversion as an additional feature to build lexicographic strategies. [sent-901, score-1.047]

97 While this question is meant to consider only cue permutations and not inversions for constructing lexicographic strategies, the objective of minimization is combined with both these features in a second approximation problem emerging from Section 6. [sent-907, score-1.115]

98 One could also generalize lexicographic strategies to the effect that more than two outcomes, correct or incorrect, of a lexicographic comparison are possible. [sent-916, score-0.895]

99 For the learning of lexicographic strategies using cue inversions we have provided a simple and efficient algorithm that approximates the maximum number of correct inferences to within a con79 S CHMITT AND M ARTIGNON stant factor. [sent-932, score-1.276]

100 Thus, it seems that cue inversions lead much easier to good performance bounds than cue permutations. [sent-933, score-1.245]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cue', 0.601), ('lexicographic', 0.404), ('exicographic', 0.31), ('cues', 0.209), ('trategy', 0.198), ('inimum', 0.195), ('incorrect', 0.148), ('permutation', 0.133), ('ncorrect', 0.118), ('ttb', 0.118), ('reedy', 0.113), ('ue', 0.113), ('inferences', 0.111), ('ermutation', 0.096), ('itting', 0.096), ('gigerenzer', 0.086), ('opt', 0.078), ('artignon', 0.075), ('chmitt', 0.075), ('trategies', 0.075), ('permutations', 0.067), ('martignon', 0.064), ('hitting', 0.059), ('strategies', 0.057), ('boolean', 0.052), ('vc', 0.05), ('frugal', 0.049), ('approximability', 0.048), ('goldstein', 0.043), ('inversions', 0.043), ('schmitt', 0.043), ('strategy', 0.043), ('inversion', 0.042), ('vertex', 0.041), ('cardinality', 0.039), ('ausiello', 0.037), ('hamburg', 0.037), ('validities', 0.037), ('sq', 0.037), ('pairs', 0.036), ('rationality', 0.036), ('np', 0.036), ('ordering', 0.035), ('ranking', 0.034), ('agnostic', 0.032), ('diff', 0.032), ('shattered', 0.032), ('sseldorf', 0.032), ('laura', 0.032), ('correct', 0.03), ('approximates', 0.03), ('cities', 0.028), ('greedy', 0.028), ('vi', 0.028), ('todd', 0.028), ('heuristics', 0.028), ('garey', 0.027), ('list', 0.027), ('pref', 0.027), ('ratio', 0.027), ('restrictions', 0.027), ('validity', 0.026), ('pair', 0.026), ('decision', 0.025), ('pac', 0.025), ('ui', 0.024), ('sn', 0.024), ('comparisons', 0.023), ('br', 0.023), ('eedback', 0.023), ('discriminates', 0.023), ('earning', 0.022), ('ertex', 0.021), ('gerd', 0.021), ('hoffrage', 0.021), ('irre', 0.021), ('ludwigsburg', 0.021), ('nversion', 0.021), ('opti', 0.021), ('instances', 0.021), ('reduction', 0.021), ('psychology', 0.021), ('dichotomy', 0.02), ('polynomial', 0.02), ('city', 0.019), ('rc', 0.019), ('bi', 0.018), ('establish', 0.018), ('newell', 0.018), ('element', 0.018), ('article', 0.017), ('total', 0.017), ('anthony', 0.017), ('pro', 0.017), ('unless', 0.017), ('cohen', 0.016), ('preference', 0.016), ('amaldi', 0.016), ('apx', 0.016), ('arndt', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies

Author: Michael Schmitt, Laura Martignon

Abstract: Fast and frugal heuristics are well studied models of bounded rationality. Psychological research has proposed the take-the-best heuristic as a successful strategy in decision making with limited resources. Take-the-best searches for a sufficiently good ordering of cues (or features) in a task where objects are to be compared lexicographically. We investigate the computational complexity of finding optimal cue permutations for lexicographic strategies and prove that the problem is NP-complete. It follows that no efficient (that is, polynomial-time) algorithm computes optimal solutions, unless P = NP. We further analyze the complexity of approximating optimal cue permutations for lexicographic strategies. We show that there is no efficient algorithm that approximates the optimum to within any constant factor, unless P = NP. The results have implications for the complexity of learning lexicographic strategies from examples. They show that learning them in polynomial time within the model of agnostic probably approximately correct (PAC) learning is impossible, unless RP = NP. We further consider greedy approaches for building lexicographic strategies and determine upper and lower bounds for the performance ratio of simple algorithms. Moreover, we present a greedy algorithm that performs provably better than take-the-best. Tight bounds on the sample complexity for learning lexicographic strategies are also given in this article. Keywords: bounded rationality, fast and frugal heuristic, PAC learning, NP-completeness, hardness of approximation, greedy method

2 0.066431835 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

3 0.042784642 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

4 0.042016063 25 jmlr-2006-Distance Patterns in Structural Similarity

Author: Thomas Kämpke

Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base

5 0.038045559 48 jmlr-2006-Learning Minimum Volume Sets

Author: Clayton D. Scott, Robert D. Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency

6 0.037871782 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities

7 0.032568384 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

8 0.023171807 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra     (Special Topic on Machine Learning and Optimization)

9 0.020610258 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research     (Special Topic on Machine Learning and Optimization)

10 0.02017995 53 jmlr-2006-Learning a Hidden Hypergraph

11 0.019957228 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

12 0.019773189 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

13 0.018755006 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

14 0.018279852 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

15 0.018133253 4 jmlr-2006-A Linear Non-Gaussian Acyclic Model for Causal Discovery

16 0.017867159 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes

17 0.017628187 12 jmlr-2006-Active Learning with Feedback on Features and Instances

18 0.01729664 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

19 0.016992789 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

20 0.015985847 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.096), (1, -0.036), (2, -0.034), (3, 0.022), (4, -0.045), (5, 0.037), (6, 0.094), (7, -0.024), (8, -0.082), (9, 0.064), (10, -0.006), (11, -0.011), (12, 0.086), (13, 0.071), (14, -0.04), (15, 0.002), (16, -0.006), (17, 0.066), (18, 0.03), (19, -0.05), (20, -0.228), (21, -0.025), (22, -0.075), (23, -0.121), (24, 0.027), (25, 0.021), (26, 0.055), (27, 0.196), (28, -0.091), (29, 0.147), (30, 0.225), (31, -0.229), (32, -0.004), (33, -0.358), (34, -0.149), (35, -0.092), (36, 0.096), (37, -0.075), (38, 0.244), (39, 0.107), (40, -0.073), (41, 0.016), (42, 0.186), (43, 0.013), (44, 0.099), (45, 0.32), (46, -0.029), (47, -0.309), (48, 0.034), (49, 0.181)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97416216 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies

Author: Michael Schmitt, Laura Martignon

Abstract: Fast and frugal heuristics are well studied models of bounded rationality. Psychological research has proposed the take-the-best heuristic as a successful strategy in decision making with limited resources. Take-the-best searches for a sufficiently good ordering of cues (or features) in a task where objects are to be compared lexicographically. We investigate the computational complexity of finding optimal cue permutations for lexicographic strategies and prove that the problem is NP-complete. It follows that no efficient (that is, polynomial-time) algorithm computes optimal solutions, unless P = NP. We further analyze the complexity of approximating optimal cue permutations for lexicographic strategies. We show that there is no efficient algorithm that approximates the optimum to within any constant factor, unless P = NP. The results have implications for the complexity of learning lexicographic strategies from examples. They show that learning them in polynomial time within the model of agnostic probably approximately correct (PAC) learning is impossible, unless RP = NP. We further consider greedy approaches for building lexicographic strategies and determine upper and lower bounds for the performance ratio of simple algorithms. Moreover, we present a greedy algorithm that performs provably better than take-the-best. Tight bounds on the sample complexity for learning lexicographic strategies are also given in this article. Keywords: bounded rationality, fast and frugal heuristic, PAC learning, NP-completeness, hardness of approximation, greedy method

2 0.24359213 48 jmlr-2006-Learning Minimum Volume Sets

Author: Clayton D. Scott, Robert D. Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency

3 0.18837339 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs

4 0.16683605 25 jmlr-2006-Distance Patterns in Structural Similarity

Author: Thomas Kämpke

Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base

5 0.15752603 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities

Author: Adam R. Klivans, Rocco A. Servedio

Abstract: We consider two well-studied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length ˜ 1/3 ˜ 1/3 k over n variables using 2O(k ) log n examples and time nO(k ) . This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in the relevant parameters. Our approach establishes a relationship between attribute efficient learning and polynomial threshold functions and is based on a new construction of low degree, low weight polynomial threshold functions for decision lists. For a wide range of parameters our construction matches a lower bound due to Beigel for decision lists and gives an essentially optimal tradeoff between polynomial threshold function degree and weight. Second, we give an algorithm for learning an unknown parity function on k out of n variables using O(n1−1/k ) examples in poly(n) time. For k = o(log n) this yields a polynomial time algorithm with sample complexity o(n); this is the first polynomial time algorithm for learning parity on a superconstant number of variables with sublinear sample complexity. We also give a simple algorithm for learning an unknown length-k parity using O(k log n) examples in nk/2 time, which improves on the naive nk time bound of exhaustive search. Keywords: PAC learning, attribute efficiency, learning parity, decision lists, Winnow

6 0.11812915 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

7 0.11371294 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

8 0.10192289 53 jmlr-2006-Learning a Hidden Hypergraph

9 0.097779021 67 jmlr-2006-On Representing and Generating Kernels by Fuzzy Equivalence Relations

10 0.092682786 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

11 0.092197739 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

12 0.090617411 78 jmlr-2006-Rearrangement Clustering: Pitfalls, Remedies, and Applications

13 0.084141478 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

14 0.083457194 76 jmlr-2006-QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

15 0.083237559 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs     (Special Topic on Machine Learning and Optimization)

16 0.083100162 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

17 0.081883162 49 jmlr-2006-Learning Parts-Based Representations of Data

18 0.081109449 4 jmlr-2006-A Linear Non-Gaussian Acyclic Model for Causal Discovery

19 0.074213237 83 jmlr-2006-Sparse Boosting

20 0.072550848 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.011), (36, 0.067), (45, 0.015), (50, 0.043), (61, 0.012), (63, 0.569), (76, 0.021), (78, 0.011), (79, 0.011), (81, 0.042), (90, 0.018), (91, 0.03), (96, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95163584 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications     (Special Topic on Machine Learning and Optimization)

Author: Pavel Laskov, Christian Gehl, Stefan Krüger, Klaus-Robert Müller

Abstract: Incremental Support Vector Machines (SVM) are instrumental in practical applications of online learning. This work focuses on the design and analysis of efficient incremental SVM learning, with the aim of providing a fast, numerically stable and robust implementation. A detailed analysis of convergence and of algorithmic complexity of incremental SVM learning is carried out. Based on this analysis, a new design of storage and numerical operations is proposed, which speeds up the training of an incremental SVM by a factor of 5 to 20. The performance of the new algorithm is demonstrated in two scenarios: learning with limited resources and active learning. Various applications of the algorithm, such as in drug discovery, online monitoring of industrial devices and and surveillance of network traffic, can be foreseen. Keywords: incremental SVM, online learning, drug discovery, intrusion detection

same-paper 2 0.93027049 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies

Author: Michael Schmitt, Laura Martignon

Abstract: Fast and frugal heuristics are well studied models of bounded rationality. Psychological research has proposed the take-the-best heuristic as a successful strategy in decision making with limited resources. Take-the-best searches for a sufficiently good ordering of cues (or features) in a task where objects are to be compared lexicographically. We investigate the computational complexity of finding optimal cue permutations for lexicographic strategies and prove that the problem is NP-complete. It follows that no efficient (that is, polynomial-time) algorithm computes optimal solutions, unless P = NP. We further analyze the complexity of approximating optimal cue permutations for lexicographic strategies. We show that there is no efficient algorithm that approximates the optimum to within any constant factor, unless P = NP. The results have implications for the complexity of learning lexicographic strategies from examples. They show that learning them in polynomial time within the model of agnostic probably approximately correct (PAC) learning is impossible, unless RP = NP. We further consider greedy approaches for building lexicographic strategies and determine upper and lower bounds for the performance ratio of simple algorithms. Moreover, we present a greedy algorithm that performs provably better than take-the-best. Tight bounds on the sample complexity for learning lexicographic strategies are also given in this article. Keywords: bounded rationality, fast and frugal heuristic, PAC learning, NP-completeness, hardness of approximation, greedy method

3 0.78885925 57 jmlr-2006-Linear State-Space Models for Blind Source Separation

Author: Rasmus Kongsgaard Olsson, Lars Kai Hansen

Abstract: We apply a type of generative modelling to the problem of blind source separation in which prior knowledge about the latent source signals, such as time-varying auto-correlation and quasiperiodicity, are incorporated into a linear state-space model. In simulations, we show that in terms of signal-to-error ratio, the sources are inferred more accurately as a result of the inclusion of strong prior knowledge. We explore different schemes of maximum-likelihood optimization for the purpose of learning the model parameters. The Expectation Maximization algorithm, which is often considered the standard optimization method in this context, results in slow convergence when the noise variance is small. In such scenarios, quasi-Newton optimization yields substantial improvements in a range of signal to noise ratios. We analyze the performance of the methods on convolutive mixtures of speech signals. Keywords: blind source separation, state-space model, independent component analysis, convolutive model, EM, speech modelling

4 0.60360962 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs     (Special Topic on Machine Learning and Optimization)

Author: Tobias Glasmachers, Christian Igel

Abstract: Support vector machines are trained by solving constrained quadratic optimization problems. This is usually done with an iterative decomposition algorithm operating on a small working set of variables in every iteration. The training time strongly depends on the selection of these variables. We propose the maximum-gain working set selection algorithm for large scale quadratic programming. It is based on the idea to greedily maximize the progress in each single iteration. The algorithm takes second order information from cached kernel matrix entries into account. We prove the convergence to an optimal solution of a variant termed hybrid maximum-gain working set selection. This method is empirically compared to the prominent most violating pair selection and the latest algorithm using second order information. For large training sets our new selection scheme is significantly faster. Keywords: working set selection, sequential minimal optimization, quadratic programming, support vector machines, large scale optimization

5 0.53054601 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled data sets. We apply our learning algorithm to the blind one-microphone speech separation problem, casting the problem as one of segmentation of the spectrogram. Keywords: spectral clustering, blind source separation, computational auditory scene analysis

6 0.5181188 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs    (Special Topic on Machine Learning and Optimization)

7 0.47618377 44 jmlr-2006-Large Scale Transductive SVMs

8 0.47293627 53 jmlr-2006-Learning a Hidden Hypergraph

9 0.46524683 21 jmlr-2006-Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis

10 0.45676494 47 jmlr-2006-Learning Image Components for Object Recognition

11 0.45561671 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

12 0.45537302 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems     (Special Topic on Machine Learning and Optimization)

13 0.45059738 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities

14 0.43975025 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models     (Special Topic on Machine Learning and Optimization)

15 0.4363856 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models     (Special Topic on Machine Learning and Optimization)

16 0.43140176 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

17 0.43080828 70 jmlr-2006-Online Passive-Aggressive Algorithms

18 0.43075842 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming     (Special Topic on Machine Learning and Optimization)

19 0.42764395 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

20 0.42516986 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting