nips nips2013 nips2013-176 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Özgür1 Şimşek
Abstract: Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. This paper develops these ideas further and examines their empirical relevance in 51 natural environments. The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. [sent-3, score-1.041]
2 This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. [sent-4, score-0.215]
3 The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. [sent-6, score-0.523]
4 Typically, some attributes of the objects are available as input to the decision. [sent-8, score-0.225]
5 This estimate leads to a decision between objects A and B as follows, where 1 ∆xi is used to denote the difference in attribute values between the two objects: yA − yB ˆ ˆ Decision rule = = w1 (x1A − x1B ) + w2 (x2A − x2B ) + . [sent-25, score-0.595]
6 + wk ∆xk : Choose object A Choose object B Choose randomly if w1 ∆x1 + . [sent-31, score-0.258]
7 + wk ∆xk = 0 (2) (3) This decision rule does not need the linear estimator in its entirety. [sent-40, score-0.549]
8 The literature on simple decision heuristics [1, 2] has identified several environmental structures that allow simple rules to make decisions identical to those of the linear decision rule using less information [3]. [sent-46, score-1.216]
9 These are dominance [4], cumulative dominance [5, 6], and noncompensatoriness [7, 8, 9, 10, 11]. [sent-47, score-1.532]
10 I refer to attributes also as cues and to the signs of the weights as cue directions, as in the heuristics literature. [sent-49, score-0.606]
11 A heuristic that corresponds to a particular linear decision rule is one whose cue directions, and cue order if it needs them, are identical to those of the linear decision rule. [sent-51, score-1.186]
12 The first is unit weighting [12, 13, 14, 15, 16, 17], which uses a linear decision rule with weights of +1 or −1. [sent-53, score-0.594]
13 The second is the family of lexicographic heuristics [18, 19], which examine cues one at a time, in a specified order, until a cue is found that discriminates between the objects. [sent-54, score-0.592]
14 1 Dominance If all terms wi ∆xi in Decision Rule 3 are nonnegative, and at least one of them is positive, then object A dominates object B. [sent-58, score-0.305]
15 If all terms wi ∆xi are zero, then objects A and B are dominance equivalent. [sent-59, score-0.837]
16 It is easy to see that the linear decision rule chooses the dominant object if there is one. [sent-60, score-0.611]
17 If objects are dominance equivalent, the decision rule chooses randomly. [sent-61, score-1.23]
18 When it is present, most decision heuristics choose identically to the linear decision rule if their cue directions match those of the linear rule. [sent-63, score-1.218]
19 These include unit weighting and lexicographic heuristics, with any ordering of the cues. [sent-64, score-0.224]
20 I occasionally refer to dominance as simple dominance to differentiate it from cumulative dominance, which I discuss next. [sent-66, score-1.496]
21 2 Cumulative dominance The linear sum in Equation 2 may be written alternatively as follows: yA − yB ˆ ˆ where (1) ∆xi = (w1 − w2 )∆x1 + (w2 − w3 )(∆x1 + ∆x2 ) +(w3 − w4 )(∆x1 + ∆x2 + ∆x3 ) + . [sent-68, score-0.687]
22 + wk ∆xk , = i j=1 (4) ∆xj , ∀i , (2) wi = wi − wi+1 , i = 1, 2, . [sent-76, score-0.224]
23 To this alternative linear sum in Equation 4, we can apply the earlier dominance result, obtaining a new dominance relationship called cumulative dominance. [sent-79, score-1.48]
24 Cumulative dominance uses an additional piece of information on the weights: their relative ordering. [sent-80, score-0.678]
25 Object A cumulatively dominates object B if all terms wi ∆xi are nonnegative and at least one of them is positive. [sent-81, score-0.247]
26 The linear decision rule chooses the cumulative-dominant object if there is one. [sent-83, score-0.586]
27 If objects are 2 cumulative-dominance equivalent, the linear decision rule chooses randomly. [sent-84, score-0.599]
28 wk are positive and decreasing, it suffices to examine ∆xi to check for cumulative dominance (because wi > 0, ∀i). [sent-87, score-0.924]
29 The attributes would be the number of each type of coin in the pile, and the weights would be the financial value of each type of coin. [sent-89, score-0.216]
30 A pile that contains 6 one-dollar coins, 4 fifty-cent coins, and 2 ten-cent coins cumulatively dominates (but not simply dominates) a pile containing 3 one-dollar coins, 5 fifty-cent coins, and 1 ten-cent coin: 6 > 3, 6 + 4 > 3 + 5, 6 + 4 + 2 > 3 + 5 + 1. [sent-90, score-0.249]
31 Cumulative dominance is therefore more likely to hold than simple dominance. [sent-92, score-0.684]
32 When a cumulative-dominance relationship holds, the linear decision rule, the corresponding lexicographic decision rule, and the corresponding unit-weighting rule decide identically, with one exception: unit weighting may find a tie where the linear decision rule does not [5]. [sent-93, score-1.474]
33 Consider the linear decision rule as a sequential process, where the terms wi ∆xi are added one by one, in order of nonincreasing weights. [sent-98, score-0.571]
34 If we were to stop after the first discriminating attribute, would our decision be identical to the one we would make by processing all attributes? [sent-99, score-0.368]
35 The answer is no, it is not possible for subsequent attributes to reverse the early decision, if the attributes are binary, taking values of 0 or 1, and the weights satisfy the set of constraints k wi > j=i+1 wj , i = 1, 2, . [sent-101, score-0.388]
36 With binary attributes and noncompensatory weights, the linear decision rule and the corresponding lexicographic decision rule decide identically [7, 8]. [sent-109, score-1.483]
37 3 A probabilistic approach to dominance k To choose between two objects, the linear decision rule examines whether i=1 wi ∆xi is above, below, or equal to zero. [sent-112, score-1.247]
38 This comparison can be made with certainty, without knowing the exact values of the weights, if a dominance relationships exists. [sent-113, score-0.659]
39 As a motivating example, consider the case where 9 out of 10 attributes favor object A against object B. [sent-116, score-0.303]
40 Although we cannot be certain that the linear decision rule will select object A, that would be a very good bet. [sent-117, score-0.564]
41 This yields the following estimate of the probability PA that the linear decision rule will select object A: PA ≈ P (X > 0), where X ∼ N ( p−n , p 2 2 +n2 12 ). [sent-122, score-0.564]
42 3 4 An empirical analysis of relevance I now turn to the question of whether dominance and noncompensatoriness exist in our environment in any substantial amount. [sent-125, score-0.758]
43 When binary versions of 20 natural datasets were used to train a multiple linear regression model, at least 3 of the 20 models were found to have noncompensatory weights [8]. [sent-127, score-0.445]
44 1 In the same 20 datasets, with a restriction of 5 on the maximum number of attributes, the proportion of object pairs that exhibited simple dominance ranged from 13% to 75% [4]. [sent-128, score-0.921]
45 2 I present two sets of results: on the original datasets and on binary versions where numeric attributes were dichotomized by splitting around the median (assigning the median value to the category with fewer objects). [sent-134, score-0.52]
46 I refer to the original datasets as numeric datasets but it should be noted that one dataset had only binary attributes and many datasets had at least one binary attribute. [sent-135, score-0.697]
47 A decision was considered to be accurate if it selected an object whose criterion value was equal to the maximum of the criterion values of the objects being compared. [sent-139, score-0.555]
48 Cumulative dominance and noncompensatoriness are sensitive to the units of measurement of the attributes. [sent-140, score-0.758]
49 The linear decision rule was obtained using multiple linear regression with elastic net regularization [21], which contains both a ridge penalty and a lasso penalty. [sent-142, score-0.541]
50 I refer to the linear decision rule learned in this manner as the base decision rule. [sent-151, score-0.856]
51 On datasets with fewer than 1000 pairs of objects, a separate linear decision rule was learned for every pair of objects, using all other objects as the training set. [sent-152, score-0.707]
52 On larger datasets, the pairs of objects were randomly placed in 1000 folds and a separate model was learned for each fold, training with all objects not contained in that fold. [sent-153, score-0.232]
53 Performance of the base decision rule The accuracy of the base decision rule differed substantially across datasets, ranging from barely above chance to near-perfect. [sent-155, score-1.124]
54 Dominance Figure 1 shows prevalence of dominance, measured by the proportion of object pairs in which one object dominates the other or the two objects are equivalent. [sent-168, score-0.493]
55 The figure shows four types of dominance in each of the datasets. [sent-169, score-0.659]
56 Simple and cumulative dominance are displayed as 1 The authors found 3 datasets in which the weights were noncompensatory and the order of the weights was identical to the cue order of the take-the-best heuristic [19]. [sent-170, score-1.454]
57 It is possible that additional datasets had noncompensatory weights but did not match the take-the-best cue order. [sent-171, score-0.526]
58 2 The datasets included the 20 datasets in Czerlinski, Gigerenzer & Goldstein [20], which were used to obtain the two sets of earlier results discussed above [8, 4]. [sent-172, score-0.231]
59 Blue lines show simple dominance, red lines show cumulative dominance, blue-filled circles show approximate simple dominance, and red-filled circles show approximate cumulative dominance. [sent-186, score-0.436]
60 Recall that simple dominance implies cumulative dominance, so the blue lines show pairs with both simple- and cumulative-dominance relationships. [sent-188, score-0.868]
61 Approximate simple and cumulative dominance are displayed as blue- and red-filled circles, respectively. [sent-189, score-0.832]
62 The mean, median, minimum, and maximum prevalence of each type of dominance across the datasets N UMERIC DATASETS Mean Median Min Max Mean B INARY DATASETS Median Min Max PREVALENCE Dom Dom approx c=0. [sent-191, score-0.875]
63 Accuracy is shown as a percentage of the accuracy of the base decision rule. [sent-280, score-0.439]
64 Blue lines show simple dominance, red lines show cumulative dominance, blue-filled circles show approximate simple dominance, and redfilled circles show approximate cumulative dominance. [sent-294, score-0.436]
65 Green circles show the accuracy of the base decision rule for comparison. [sent-295, score-0.634]
66 The approximation made a difference in 27–33 of 51 datasets, depending on type of dominance and data (numeric/binary). [sent-297, score-0.659]
67 Figure 2 shows the accuracy of decisions guided by dominance: choose the dominant object when there is one; choose randomly otherwise. [sent-300, score-0.292]
68 This accuracy can be higher than the accuracy of the base decision rule, which happens if choosing randomly is more accurate than the base decision rule on pairs that exhibit no dominance relationship. [sent-301, score-1.702]
69 Table 1 shows the mean, median, minimum, and maximum accuracies across the datasets measured as a percentage of the accuracy of the base decision rule. [sent-302, score-0.583]
70 It is worth pointing out that the accuracy of approximate cumulative dominance in binary datasets ranged from 93. [sent-304, score-1.053]
71 In the results discussed so far, approximate dominance was computed by setting c = 0. [sent-307, score-0.659]
72 This value was selected prior to the analysis based on what this parameter means: 1 − c is the expected error rate of the approximation, where error rate is the proportion of approximately dominant objects that are not selected by the linear decision rule. [sent-309, score-0.523]
73 05 q Dom binary Cum dom binary Dom numeric Cum dom numeric q 0. [sent-324, score-0.644]
74 05 Binary datasets Figure 3: Left: Error rates of approximate dominance with various values of the approximation parameter c. [sent-332, score-0.765]
75 Right: Proportion of linear models with noncompensatory weights in each of the datasets. [sent-333, score-0.282]
76 Noncompensatoriness Let noncompensation be a logical variable that equals T RUE if the decision of the first discriminating cue, when cues are processed in nonincreasing magnitude of the weights, is identical to the decision of the linear decision rule. [sent-334, score-1.236]
77 With binary cues and noncompensatory weights, noncompensation is T RUE with probability 1. [sent-335, score-0.447]
78 If noncompensation is T RUE, the linear decision rule and the corresponding lexicographic rule make identical decisions. [sent-337, score-0.98]
79 Figure 3, right panel, shows the proportion of base decision rules with noncompensatory weights in binary datasets. [sent-338, score-0.792]
80 Recall that a large number of base decision rules were learned on each dataset, using different training sets and random seeds. [sent-339, score-0.419]
81 The proportion of base decision rules with noncompensatory weights ranged from 0 to 1, with a mean of 0. [sent-340, score-0.795]
82 Figure 4 shows noncompensation in each dataset, together with the accuracies of the base decision rule and the corresponding lexicographic rule. [sent-346, score-0.902]
83 Consequently, the accuracy of the lexicographic rule was very close to that of the linear decision rule: its median accuracy relative to the base decision rule was 96% in numeric datasets and 100% in binary datasets. [sent-352, score-1.656]
84 In summary, although noncompensatory weights were not particularly prevalent in the datasets, actual levels of noncompensation were very high. [sent-353, score-0.46]
85 5 Discussion It is fair to conclude that all three environmental structures are prevalent in natural environments to such a high degree that decisions guided by these structures approach, and occasionally exceed, the base decision model in predictive accuracy. [sent-354, score-0.727]
86 We have not examined the performance of any particular decision heuristic, which depends on the cue directions and cue order it uses. [sent-355, score-0.689]
87 These will not necessarily match those of the linear decision rule. [sent-356, score-0.332]
88 3 The results here show that it is possible for decision heuristics to succeed in natural environments by imitating the decisions of the linear model using less information and less computation— because the conditions that make it possible are prevelant—but not that they necessarily do so. [sent-357, score-0.606]
89 3 When this is the case, it should be noted, the decision heuristic may have a higher predictive accuracy than the linear model. [sent-358, score-0.411]
90 8 q q q Accuracy BINARY qq q qq q q q q q q q qq q q q q q q qq q q q q q 0. [sent-366, score-0.368]
91 For each dataset, the proportion of decisions in which noncompensation took place are plotted against the accuracy of the base decision rule (displayed in green circles) and the accuracy of the corresponding lexicographic rule (displayed in blue plus signs). [sent-377, score-1.252]
92 When decision heuristics are examined through the lens of bias-variance decomposition [23, 24, 25], the three environmental structures examined here are particularly relevant for the bias component of the prediction error. [sent-379, score-0.639]
93 The results presented here suggest that while simple decision heuristics examine a tiny fraction of the set of linear models, in natural environments, they may do so without introducing much additional bias. [sent-380, score-0.532]
94 The probabilistic approximations of dominance and of cumulative dominance introduced in this paper can be used as decision heuristics themselves, combined with any method of estimating cue directions and cue order. [sent-387, score-2.267]
95 Finally, I hope that these results will stimulate further research in statistical properties of decision environments, as well as cognitive models that exploit them, for further insights into higher cognition. [sent-389, score-0.33]
96 Cumulative dominance and heuristic performance in binary multiattribute choice. [sent-422, score-0.772]
97 Tight upper bounds for the expected loss of lexicographic heuristics in binary multi-attribute choice. [sent-428, score-0.407]
98 Na¨ve heuristics for paired comparisons: Some results on their ı relative accuracy. [sent-454, score-0.233]
99 Why do simple heuristics perform well in choices with binary attributes? [sent-459, score-0.257]
100 The robust beauty of improper linear models in decision making. [sent-480, score-0.332]
wordName wordTfidf (topN-words)
[('dominance', 0.659), ('decision', 0.304), ('noncompensatory', 0.182), ('lexicographic', 0.175), ('heuristics', 0.175), ('cue', 0.166), ('noncompensation', 0.165), ('gigerenzer', 0.149), ('numeric', 0.144), ('rule', 0.141), ('attributes', 0.121), ('dom', 0.121), ('cumulative', 0.115), ('datasets', 0.106), ('objects', 0.104), ('noncompensatoriness', 0.099), ('qq', 0.092), ('object', 0.091), ('cum', 0.088), ('base', 0.079), ('wk', 0.076), ('wi', 0.074), ('prevalence', 0.072), ('weights', 0.072), ('coins', 0.067), ('environmental', 0.064), ('proportion', 0.062), ('ranged', 0.06), ('binary', 0.057), ('accuracy', 0.056), ('circles', 0.054), ('decisions', 0.052), ('pile', 0.05), ('weighting', 0.049), ('dominates', 0.049), ('lled', 0.048), ('environments', 0.047), ('attribute', 0.046), ('median', 0.046), ('xk', 0.044), ('todd', 0.044), ('cues', 0.043), ('prevalent', 0.041), ('paired', 0.039), ('approx', 0.038), ('occasionally', 0.038), ('accuracies', 0.038), ('abc', 0.038), ('discriminating', 0.038), ('rue', 0.038), ('rules', 0.036), ('structures', 0.036), ('psychological', 0.034), ('carrasco', 0.033), ('cumulatively', 0.033), ('czerlinski', 0.033), ('discriminates', 0.033), ('frugal', 0.033), ('hogarth', 0.033), ('katsikopoulos', 0.033), ('martignon', 0.033), ('multiattribute', 0.033), ('ozgur', 0.033), ('simsek', 0.033), ('displayed', 0.033), ('oxford', 0.033), ('guided', 0.03), ('examined', 0.03), ('identically', 0.03), ('signs', 0.029), ('criterion', 0.028), ('linear', 0.028), ('cognitive', 0.026), ('xi', 0.026), ('identical', 0.026), ('qqqq', 0.025), ('intercept', 0.025), ('dominant', 0.025), ('simple', 0.025), ('lines', 0.024), ('yb', 0.024), ('psychometrika', 0.024), ('nonincreasing', 0.024), ('pairs', 0.024), ('ces', 0.023), ('directions', 0.023), ('mimic', 0.023), ('coin', 0.023), ('heuristic', 0.023), ('examines', 0.022), ('smart', 0.022), ('chooses', 0.022), ('ya', 0.021), ('blue', 0.021), ('regularization', 0.02), ('elastic', 0.02), ('ranging', 0.02), ('earlier', 0.019), ('relative', 0.019), ('choose', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
Author: Özgür1 Şimşek
Abstract: Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. This paper develops these ideas further and examines their empirical relevance in 51 natural environments. The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. 1
2 0.088527113 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
3 0.080228552 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
Author: Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi
Abstract: Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization. 1
4 0.072360784 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
Author: Leonid Boytsov, Bilegsaikhan Naidan
Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1
5 0.070427269 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
Author: Vibhav Vineet, Carsten Rother, Philip Torr
Abstract: Many methods have been proposed to solve the problems of recovering intrinsic scene properties such as shape, reflectance and illumination from a single image, and object class segmentation separately. While these two problems are mutually informative, in the past not many papers have addressed this topic. In this work we explore such joint estimation of intrinsic scene properties recovered from an image, together with the estimation of the objects and attributes present in the scene. In this way, our unified framework is able to capture the correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene. For example, our model is able to enforce the condition that if a set of pixels take same object label, e.g. table, most likely those pixels would receive similar reflectance values. We cast the problem in an energy minimization framework and demonstrate the qualitative and quantitative improvement in the overall accuracy on the NYU and Pascal datasets. 1
6 0.063815765 5 nips-2013-A Deep Architecture for Matching Short Texts
7 0.062667415 211 nips-2013-Non-Linear Domain Adaptation with Boosting
8 0.058698867 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively
9 0.054265108 347 nips-2013-Variational Planning for Graph-based MDPs
10 0.053994942 335 nips-2013-Transfer Learning in a Transductive Setting
11 0.049192462 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
12 0.049185872 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
13 0.048999883 318 nips-2013-Structured Learning via Logistic Regression
14 0.048513904 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
15 0.046874754 193 nips-2013-Mixed Optimization for Smooth Functions
16 0.045449417 84 nips-2013-Deep Neural Networks for Object Detection
17 0.044530131 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
18 0.044355165 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
19 0.038492255 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
20 0.035674576 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
topicId topicWeight
[(0, 0.112), (1, 0.007), (2, -0.045), (3, -0.018), (4, 0.028), (5, 0.008), (6, -0.033), (7, -0.015), (8, -0.025), (9, 0.047), (10, -0.019), (11, -0.005), (12, 0.04), (13, -0.041), (14, -0.016), (15, 0.014), (16, -0.009), (17, 0.006), (18, -0.02), (19, 0.049), (20, 0.052), (21, 0.024), (22, -0.068), (23, 0.121), (24, 0.012), (25, 0.003), (26, 0.009), (27, -0.075), (28, -0.01), (29, 0.047), (30, -0.003), (31, 0.076), (32, -0.096), (33, -0.04), (34, -0.039), (35, -0.009), (36, -0.038), (37, -0.103), (38, -0.019), (39, 0.087), (40, -0.068), (41, -0.006), (42, -0.054), (43, 0.008), (44, 0.054), (45, -0.026), (46, 0.003), (47, 0.036), (48, -0.051), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.93199664 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
Author: Özgür1 Şimşek
Abstract: Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. This paper develops these ideas further and examines their empirical relevance in 51 natural environments. The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. 1
2 0.58426648 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
Author: Leonid Boytsov, Bilegsaikhan Naidan
Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1
3 0.56191903 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
4 0.48024333 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family
Author: Xiaojin Zhu
Abstract: What if there is a teacher who knows the learning goal and wants to design good training data for a machine learner? We propose an optimal teaching framework aimed at learners who employ Bayesian models. Our framework is expressed as an optimization problem over teaching examples that balance the future loss of the learner and the effort of the teacher. This optimization problem is in general hard. In the case where the learner employs conjugate exponential family models, we present an approximate algorithm for finding the optimal teaching set. Our algorithm optimizes the aggregate sufficient statistics, then unpacks them into actual teaching examples. We give several examples to illustrate our framework. 1
5 0.47912061 183 nips-2013-Mapping paradigm ontologies to and from the brain
Author: Yannick Schwartz, Bertrand Thirion, Gael Varoquaux
Abstract: Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies. 1
6 0.46956974 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
7 0.46364188 211 nips-2013-Non-Linear Domain Adaptation with Boosting
8 0.46107134 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting
9 0.45558733 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
10 0.45303518 161 nips-2013-Learning Stochastic Inverses
11 0.45140654 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
12 0.44895029 244 nips-2013-Parametric Task Learning
13 0.44799843 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
14 0.43956926 134 nips-2013-Graphical Models for Inference with Missing Data
15 0.43692049 318 nips-2013-Structured Learning via Logistic Regression
16 0.42559779 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
17 0.4215191 326 nips-2013-The Power of Asymmetry in Binary Hashing
18 0.41926351 335 nips-2013-Transfer Learning in a Transductive Setting
19 0.4119508 226 nips-2013-One-shot learning by inverting a compositional causal process
20 0.40698519 223 nips-2013-On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation
topicId topicWeight
[(2, 0.014), (16, 0.03), (33, 0.13), (34, 0.069), (41, 0.043), (49, 0.05), (56, 0.086), (70, 0.038), (73, 0.304), (85, 0.048), (89, 0.028), (93, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.73521852 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
Author: Özgür1 Şimşek
Abstract: Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. This paper develops these ideas further and examines their empirical relevance in 51 natural environments. The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. 1
2 0.71331644 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
Author: Naiyan Wang, Dit-Yan Yeung
Abstract: In this paper, we study the challenging problem of tracking the trajectory of a moving object in a video with possibly very complex background. In contrast to most existing trackers which only learn the appearance of the tracked object online, we take a different approach, inspired by recent advances in deep learning architectures, by putting more emphasis on the (unsupervised) feature learning problem. Specifically, by using auxiliary natural images, we train a stacked denoising autoencoder offline to learn generic image features that are more robust against variations. This is then followed by knowledge transfer from offline training to the online tracking process. Online tracking involves a classification neural network which is constructed from the encoder part of the trained autoencoder as a feature extractor and an additional classification layer. Both the feature extractor and the classifier can be further tuned to adapt to appearance changes of the moving object. Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that our deep learning tracker is more accurate while maintaining low computational cost with real-time performance when our MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU). 1
3 0.64394963 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
Author: Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani
Abstract: Biological tissue is often composed of cells with similar morphologies replicated throughout large volumes and many biological applications rely on the accurate identification of these cells and their locations from image data. Here we develop a generative model that captures the regularities present in images composed of repeating elements of a few different types. Formally, the model can be described as convolutional sparse block coding. For inference we use a variant of convolutional matching pursuit adapted to block-based representations. We extend the KSVD learning algorithm to subspaces by retaining several principal vectors from the SVD decomposition instead of just one. Good models with little cross-talk between subspaces can be obtained by learning the blocks incrementally. We perform extensive experiments on simulated images and the inference algorithm consistently recovers a large proportion of the cells with a small number of false positives. We fit the convolutional model to noisy GCaMP6 two-photon images of spiking neurons and to Nissl-stained slices of cortical tissue and show that it recovers cell body locations without supervision. The flexibility of the block-based representation is reflected in the variability of the recovered cell shapes. 1
4 0.62445271 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
Author: Andreas Ruttor, Philipp Batz, Manfred Opper
Abstract: We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from sparse observations of the state vector. Using a Gaussian process prior over the drift as a function of the state vector, we develop an approximate EM algorithm to deal with the unobserved, latent dynamics between observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the MAP estimation of the drift is facilitated by a sparse Gaussian process regression. 1
5 0.56691504 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries
Author: Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir
Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we characterize —in a nearly complete manner— the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switch� ing costs, the attainable rate with bandit feedback is Θ(T 2/3 ). Interestingly, this √ rate is significantly worse than the Θ( T ) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also � show that a bounded memory adversary can force Θ(T 2/3 ) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies. 1
6 0.54034859 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
7 0.53514618 331 nips-2013-Top-Down Regularization of Deep Belief Networks
8 0.5345552 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
9 0.5342592 301 nips-2013-Sparse Additive Text Models with Low Rank Background
10 0.53401792 99 nips-2013-Dropout Training as Adaptive Regularization
11 0.53356713 251 nips-2013-Predicting Parameters in Deep Learning
12 0.53336591 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
13 0.53287244 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
14 0.5320698 30 nips-2013-Adaptive dropout for training deep neural networks
15 0.53140253 121 nips-2013-Firing rate predictions in optimal balanced networks
16 0.53107566 64 nips-2013-Compete to Compute
17 0.53057158 5 nips-2013-A Deep Architecture for Matching Short Texts
18 0.53051597 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
19 0.53004974 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
20 0.52972972 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent