nips nips2012 nips2012-288 knowledge-graph by maker-knowledge-mining

288 nips-2012-Rational inference of relative preferences


Source: pdf

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. [sent-2, score-0.487]

2 In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. [sent-3, score-1.34]

3 Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. [sent-4, score-0.988]

4 1 Introduction Normative theories of human choice behavior have long been based on how economic theory has postulated they should be made. [sent-5, score-0.289]

5 The standard version of the theory states that consumers seek to maximize innate, stable preferences over the options they consume. [sent-6, score-0.374]

6 The most difficult part of this theory is that preferences must exist before decisions can be made. [sent-8, score-0.21]

7 The standard response, in both economics and decision theory, to the basic question “Where do preferences come from? [sent-9, score-0.239]

8 Behavioral experiments in the last half century have conclusively demonstrated (see [18] for a comprehensive review) that human choice strongly violates the key axioms that the existence of stable utility values depends on. [sent-13, score-0.317]

9 A particular subset of these violations, called context effects, wound the utility maximization program the most deeply, since such violations cannot be explained away as systematic distortions of underlying utility and/or probability representations [22]. [sent-14, score-0.537]

10 No possible algebraic reformulation of option-specific utility functions can possibly explain preference reversals of the type exhibited in the frog legs example. [sent-16, score-0.754]

11 Preference reversals elicited through choice set variation have been observed in multiple empirical studies, using a variety of experimental tasks, and comprise one of the most powerful criticisms of the use of expected utility as a normative standard in various economic programs, e. [sent-17, score-0.509]

12 However, for all its problems, the mathematical simplicity of the utility framework and lack of principled alternatives has allowed it to retain its central role in microeconomics [12], machine learning [1], computational cognitive science [7] and neuroscience [11]. [sent-20, score-0.21]

13 1 (a) When asked to select between just (b) When presented with an additional third salmon and steak, the diner picks salmon, in- menu item, the diner picks steak, indicating dicating salmon steak by his choice steak salmon Figure 1: Illustration of Luce’s ‘frog legs’ thought experiment. [sent-21, score-2.106]

14 No possible absolute utility assignation to individual items can account for the choice behavior exhibited by the diner in this experiment. [sent-22, score-0.557]

15 The frog legs example is illustrative of reversals in preference occuring solely through variation in the set of options a subject has to choose from. [sent-23, score-0.771]

16 Our contribution in this paper is the development of a rational model that infers preferences from limited information about the relative value of options. [sent-24, score-0.377]

17 We postulate that there is a value inference process that predicts the relative goodness of items in enabling the agent to achieve its homeostatic and other longer-range needs (e. [sent-25, score-0.264]

18 However, we show that we only have to postulate that feedback from decisions provides limited information about the relative worth of options within the choice set for a decision to retrieve an inductive representation of value that is equivalent to traditional preference relations. [sent-29, score-0.66]

19 Thus, instead of assuming utilities as being present in the environment, we learn an equivalent sense of option desirability from information in a limited format that depends on the set of options in the decision set. [sent-30, score-1.017]

20 We show how to formalize the idea of relative value inference, and that it provides a new rational foundation for understanding the origins of human preferences. [sent-32, score-0.264]

21 An agent selects between possibilities x in the world represented by the set X . [sent-34, score-0.385]

22 The decision-making problem can be formulated as one wherein the agent forms a belief b(x), x ∈ X about the relative desirability of different possibilities in X and uses this belief to choose an element or subset X ∗ ⊂ X . [sent-35, score-1.022]

23 When these beliefs satisfy the axioms of utility, the belief function simply the expected utility associated with individual possibilities u(x), u : X → R. [sent-36, score-0.48]

24 The agent’s belief about the relative desirability of the world is constantly updated by information that it receives about the desirability of options in terms of value signals r(x). [sent-38, score-1.363]

25 Given a sequence of choices, the normative expectation is for agents to select possibilities in a way that maximizes their infinite-horizon cumulative discounted desirability, ∞ γ t bt (x). [sent-40, score-0.249]

26 arg max x(t) (1) t The sequence of choices selected describes the agent’s expected desirability maximizing behavior in a belief MDP-world. [sent-41, score-0.577]

27 2 From a Bayesian standpoint, it is critical to describe the belief updating about the desirability of different states. [sent-42, score-0.536]

28 Since the agent learns this distribution from observing r(x) signals from the environment, an update of the form, p(x|r(t) ) = p(r(t) |x) × p(x|{r(1) , r(2) · · · , r(t−1) }), (2) reflects the basic process of belief formation via value signals. [sent-44, score-0.28]

29 Such separation between utilities and probabilities in statistical decision theory is called probabilistic sophistication, an axiom that underlies almost all existing computational decision theory models [11]. [sent-46, score-0.364]

30 Instead, we assume we get partial information about the value of one or more options within the set of options c available in the decision instance t. [sent-48, score-0.522]

31 In this case value signals are hidden for most options x. [sent-49, score-0.236]

32 However, the set of options c ∈ C ⊆ P(X )1 observed can now potentially be used as auxiliary information to impute values for options whose value has not been observed. [sent-50, score-0.4]

33 Importantly, we concentrate on understanding the meaning of utility in this framework. [sent-52, score-0.21]

34 As in the case of value observability for all options, a probabilistic representation of utility under indirect observability must be equivalent to, p(r|x) = p(r, x) = p(x) p(r, x, c) = p(x|c)p(c) c c c p(r|x, c)p(x|c)p(c) . [sent-53, score-0.37]

35 p(x|c)p(c) c (3) The resulting prediction of value of an option couples value signals received across decision instances with different option sets, or contexts. [sent-54, score-0.392]

36 The intuition behind this approach is contained in the frog leg’s example - the set of options become informative about the hidden state of the world, like whether the restaurant has a good chef. [sent-55, score-0.395]

37 To see this, let us assume that we have defined a measure of utility u(x, c) that is sensitive to the context c of observing possibility x. [sent-58, score-0.368]

38 Now, for such a utility measure, if it is true that for any two possibilities {xi , xj } and any two contexts {ck , cl }, u(xi , ck ) > u(xj , ck ) ⇒ u(xi , cl ) > u(xj , cl ), then the choice behavior of an agent maximizing u(x, c) would be equivalent to one maximizing u(x). [sent-59, score-1.04]

39 Thus, for the inclusion of context to have any effect, there must exist at least some {xi , xj , ck , cl } for which the propositions u(xi , ck ) > u(xj , ck ) and u(xi , cl ) < u(xj , cl ) can hold simultaneously. [sent-60, score-0.443]

40 Such a measure could assign absolute numbers to each of the possibilities, but any such static assignment would make it impossible for the propositions u(x1 , X ) > u(x2 , X ) and u(x1 , X ) < u(x2 , X ) to hold simultaneously, as is desired of a context-sensitive utility measure. [sent-64, score-0.27]

41 Thus, we see that it is impossible to design a utility function u such that u : X × C → R. [sent-65, score-0.21]

42 If we wish to incorporate the effects of context variation on the desirability of a particular world possibility, we must abandon a foundational premise of existing statistical decision theory - the representational validity of absolute utility. [sent-66, score-0.886]

43 3 3 Rational decisions without utilities In place of the traditional utility framework, we define an alternative conceptual partitioning of the world X as a discrete choice problem. [sent-68, score-0.482]

44 In this new formulation, at any decision instant t, agents observe the feasibility of a subset o(t) ⊆ X of all the possibilities in the world. [sent-69, score-0.33]

45 An intelligent agent will encode its understanding of partial observability as a belief over which possibilities of the world likely co-occur. [sent-71, score-0.564]

46 We call an agent’s belief about the co-occurrence of possibilities in the world its understanding of the context of its observation. [sent-72, score-0.391]

47 We instantiate contexts c as subsets of X that the agent believes will co-occur based on its history of partial observations of the world and index them with an indicator function z on X , so that for context c(t) , z t (x) = δ(x − i). [sent-73, score-0.529]

48 i∈c(t) Instead of computing absolute utilities on all x ∈ X , a context-aware agent evaluates the comparable desirability of only those possibilities considered feasible in a particular context c. [sent-74, score-1.03]

49 Hence, instead of using scalar values to indicate which possibility is more preferable, we introduce preference information into our system via a desirability function d that simply ‘points’ to the best option in a given context, i. [sent-75, score-0.832]

50 The desirability indicated by d(c) can be remapped on to the larger set of options by defining a relative desirability across all possibilities r(x) = m, x ∈ c and zero otherwise. [sent-78, score-1.374]

51 New evidence for the desirability of outcomes observed in context c(t) is incorporated using p(r(t) |x, c(t) ), a distribution encoding the relative desirability information obtained from the environment at the current time step, conditioned on the context in which the information is obtained. [sent-87, score-1.223]

52 Defining a choice function to select the mode of the posterior belief completes a rational context-sensitive decision theory. [sent-89, score-0.377]

53 1, we describe situations in which the influence of context shifting significantly affects human preference behavior in ways that utility-based decision theories have historically been hard-pressed to explain. [sent-91, score-0.476]

54 2 we characterize conditions under which the relative desirability framework yields predictions of choice behavior equivalent to that predicted by ordinal utility theories, and hence, is an equivalent representation for encoding preferences. [sent-93, score-0.916]

55 In this section, we show how our inductive theory of context-sensitive value inference leads, not surprisingly, to a simple explanation for the major varieties of context effects seen in behavioral experiments. [sent-98, score-0.242]

56 Interestingly, we find that each of these effects can be described as a special case of the frog legs example, with the specialization arising out of additional assumptions made about the relationship of the new option added to the choice set. [sent-100, score-0.589]

57 (c) indicates that the preference in question holds only in some observation contexts. [sent-104, score-0.211]

58 We use available space to completely describe how the most general version of preference reversal, as seen in the frog legs example, emerges from our framework and provide a brief overview of the other results. [sent-108, score-0.503]

59 In the frog legs example, the reversal in preferences is anecdotally explained by the diner originally forming a low opinion of the restaurant’s chef, given the paucity of choices on the menu, deciding to pick the safe salmon over a possibly a burnt steak. [sent-110, score-1.022]

60 However, the waiter’s presenting frog legs as the daily special suddenly raises the diner’s opinion of the chef’s abilities, causing him to favor steak. [sent-111, score-0.316]

61 This intuition maps very easily into our framework of choice selection, wherein the diner’s partial 5 menu observations o1 = {steak, salmon} and o2 = {steak, salmon, frog legs} are associated with two separate contexts c1 and c2 of observing the menu X . [sent-112, score-0.591]

62 Now, when the waiter first offers the diner a choice between steak or salmon, the diner computes relative desirabilities using (4), where the only context for the observation is {salmon, steak}. [sent-120, score-0.986]

63 Hence, the relative desirabilities of steak and salmon are computed over a single context, and are simply R(salmon) = 0. [sent-121, score-0.723]

64 When the diner is next presented with the possibility of ordering frog legs, he now has two possible contexts to evaluate the desirability of his menu options: {salmon, steak} and {salmon, steak, frog legs}. [sent-124, score-1.244]

65 Based on the sequence of his history of experience with both contexts, the diner will have some posterior belief p(c) = {p, 1 − p} on the two contexts. [sent-125, score-0.35]

66 Then, the relative desirability of salmon, after having observed frog legs on the menu can be calculated using (4) as, p(r|salmon, c1 )p(salmon|c1 )p(c1 ) + p(r|salmon, c2 )p(salmon|c2 )p(c2 ) , p(salmon|c1 )p(c1 ) + p(salmon|c2 )p(c2 ) 0. [sent-126, score-0.94]

67 Clearly, for 1 − p > p, R(steak) > R(salmon), and the diner would be rational in switching his preference. [sent-134, score-0.37]

68 Along identical lines, making reasonable assumptions about the contexts of past observations, our decision framework accomodates parsimonious explanations for each of the other effects detailed in Table 1. [sent-136, score-0.282]

69 The introduction of a third item that is clearly inferior to one of the two earlier options leads the consumer towards preferring that particular earlier option. [sent-138, score-0.344]

70 Our framework elicits this behavior through the introduction of additional evidence of the desirability of one of the options from a new context, causing the relative desirability of this particular option to rise. [sent-139, score-1.37]

71 Similarity effects arise when, given that a consumer prefers one item to another, giving him further options that resemble his preferred item causes him to subsequently prefer the item he earlier considered inferior. [sent-140, score-0.571]

72 Compromise effects arise when the introduction of a third option to a choice set where the consumer already prefers one item to another causes the consumer to consider the previously inferior option as a compromise between the formerly superior option and the new option, and hence prefer it. [sent-142, score-0.787]

73 Reference point effects have typically not been associated with explicit studies of context variation, and may in fact be used to reference a number of behavior patterns that do not satisfy the definition we provide in Table 1. [sent-144, score-0.226]

74 Our definition of the reference point effect is particularized to explain data on pain perception collected by [23], demonstrating relativity in evaluation of objectively identical pain conditions depending on the magnitude of alternatively experienced pain conditions. [sent-145, score-0.276]

75 In concord with empirical observation, we show that the relative (un)desirability of an intermediate pain option reduces upon the experience of greater pain, a simple demonstration of prospect relativity that utility-based accounts of value cannot match. [sent-146, score-0.324]

76 extended discrete choice models ( [13] provides a recent review), componential context theory [21], quantum cognition [8]) or descriptive and dynamic, (specifically, decision field theory [3]). [sent-149, score-0.33]

77 In contrast, our approach not only takes a dynamic inductive view of value elicitation, it retains a normativity criterion (Bayes rationality) for falsifying observed predictions, a standard that is expected of any rational model of decision-making [6]. [sent-150, score-0.231]

78 To show that a utility function u completely represents a preference relation on X it is sufficient [12] to show that, ∀x1 , x2 ∈ X , x1 x2 ⇔ u(x1 ) > u(x2 ). [sent-159, score-0.397]

79 Hence, equivalently, to show that our measure of relative desirability R also completely represents preference information, it should be sufficient to show that, for any two possibilities xi , xj ∈ X , and for any observation context c xi xj ⇔ R(xi ) > R(xj ). [sent-160, score-1.17]

80 xi (II) Transitivity between contexts: if xi xj ⇒ xi xj ∀c ∈ Cij , {xi , xj } ∈ Cij ⊆ C. [sent-163, score-0.252]

81 Of the three assumptions we need to prove this equivalence result, (I) and (II) simply define a stable preference relation across observation contexts and find exact counterparts in the completeness and transitivity assumptions necessary for representing preferences using ordinal utility functions. [sent-167, score-0.805]

82 The restriction of infinite data observability, while stringent and putatively implausible, actually uncovers an underlying epistemological assumption of utility theory, viz. [sent-169, score-0.257]

83 Any inference based preference elicitation procedure will therefore necessarily need infinite data to attain formal equivalence with the utility representation. [sent-171, score-0.477]

84 Finally, we point out that our equivalence result does not require us to assume continuity or the equivalent Archimedean property to encode preferences, as required in ordinal utility definitions. [sent-172, score-0.267]

85 This is because the continuity assumption is required as a technical condition in mapping a discrete mathematical object (a preference relation) to a continuous utility function. [sent-173, score-0.397]

86 Since relative desirability is defined constructively on Q ⊆ Q, |Q| < ∞, a continuity assumption is not needed. [sent-174, score-0.53]

87 5 Discussion Throughout this exegesis, we have encountered three different representations of choice preferences: relative (ordinal) utilities, absolute (cardinal) utilities and our own proposal, viz. [sent-175, score-0.267]

88 y x, predominantly used in human preference modeling in neoclassical economics [12]], e. [sent-179, score-0.221]

89 The term {H} here is shorthand for {o1 , o2 , · · · , ot−1 }, {r1 , r2 , · · · rt−1 }, the entire history of choice set and relative desirability observations made by an agent leading up to the current decision instance. [sent-190, score-0.867]

90 7 Bayes rationality simply claims that value inference with the same history of partial observations will lead to a consistent preference for a particular option in discrete choice settings. [sent-192, score-0.573]

91 Bayes-rationality specializes to economic rationality once we instantiate the underlying intuitions behing the completeness and transitivity assumptions in a context-sensitive preference inference theory. [sent-196, score-0.507]

92 Therefore, rational value inference in the form we propose can formally replace static assumptions about preference orderings in microeconomic models that currently exclusively use ordinal utilities [12]. [sent-197, score-0.651]

93 As such, context-sensitive preference elicitation is immediately useful for the nascent agent-based economic modeling paradigm as well as in dynamic stochastic general equilibrium models of economic behavior. [sent-198, score-0.438]

94 In such a synthesis, our model could generate a preference relation sensitive to action set observability, which inverse planning models could use along with additional information from the environment to generate absolute utilities that account for observational biases in the agent’s history. [sent-201, score-0.371]

95 How can a desirability representation that assumes that observers maintain probabilistic preferences over all possible states of the world be more epistemologically realistic than one that assumes that observers maintain scalar utility values over the same state space3 ? [sent-204, score-0.935]

96 Another straightforward extension of our framework would imbue observable world possibilities with attributes, resulting in the possibility of deriving a more general definition of contexts as clusters in the space of attributes. [sent-209, score-0.382]

97 Such an extension would result in the possibility of transferring preferences to entirely new possibilities, allowing the set X to be modified dynamically, which would further address the epistemological criticism above. [sent-210, score-0.265]

98 In conclusion, it has long been recognized that state-specific utility representations of the desirability of options are insufficient to capture the rich variety of systematic behavior patterns that humans exhibit. [sent-213, score-0.918]

99 In this paper, we show that reformulating the atomic unit of desirability as a context-sensitive ‘pointer’ to the best option in the observed set recovers a rational way of representing desirability in a manner sufficiently powerful to describe a broad range of context effects in decisions. [sent-214, score-1.386]

100 A rational model of preference learning and choice prediction by children. [sent-302, score-0.403]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('desirability', 0.467), ('salmon', 0.331), ('steak', 0.282), ('utility', 0.21), ('diner', 0.203), ('options', 0.2), ('preference', 0.187), ('possibilities', 0.177), ('frog', 0.168), ('rational', 0.167), ('legs', 0.148), ('agent', 0.147), ('preferences', 0.147), ('option', 0.132), ('utilities', 0.126), ('economic', 0.1), ('contexts', 0.098), ('rationality', 0.097), ('menu', 0.094), ('decision', 0.092), ('context', 0.084), ('observability', 0.08), ('item', 0.079), ('pain', 0.072), ('effects', 0.069), ('belief', 0.069), ('consumer', 0.065), ('relative', 0.063), ('world', 0.061), ('cl', 0.061), ('ordinal', 0.057), ('xj', 0.053), ('elicitation', 0.051), ('cognition', 0.051), ('history', 0.049), ('choice', 0.049), ('normative', 0.048), ('desirabilities', 0.047), ('epistemological', 0.047), ('possibility', 0.046), ('reinforcement', 0.044), ('restaurants', 0.041), ('reversals', 0.041), ('behavior', 0.041), ('ck', 0.041), ('compromise', 0.039), ('theories', 0.038), ('instant', 0.037), ('decisions', 0.036), ('luce', 0.036), ('transitivity', 0.036), ('signals', 0.036), ('instantiate', 0.035), ('elicited', 0.034), ('human', 0.034), ('ci', 0.033), ('inductive', 0.033), ('zi', 0.033), ('violations', 0.033), ('experiences', 0.033), ('reference', 0.032), ('attraction', 0.031), ('busemeyer', 0.031), ('canini', 0.031), ('chef', 0.031), ('complementarily', 0.031), ('microeconomic', 0.031), ('normativity', 0.031), ('vlaev', 0.031), ('waiter', 0.031), ('static', 0.031), ('xi', 0.031), ('representational', 0.03), ('wherein', 0.03), ('partial', 0.03), ('absolute', 0.029), ('experience', 0.029), ('environment', 0.029), ('inference', 0.029), ('encoding', 0.029), ('mismatch', 0.029), ('observing', 0.028), ('iia', 0.028), ('relativity', 0.028), ('restaurant', 0.027), ('theory', 0.027), ('variation', 0.027), ('behaviors', 0.026), ('believes', 0.025), ('formerly', 0.025), ('reversal', 0.025), ('criticism', 0.025), ('observers', 0.025), ('items', 0.025), ('observation', 0.024), ('lucas', 0.024), ('inferential', 0.024), ('axioms', 0.024), ('agents', 0.024), ('assumptions', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

2 0.13195331 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

3 0.10739545 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

4 0.081043392 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

5 0.079923376 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

6 0.078733407 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

7 0.072162509 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

8 0.06763614 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

9 0.064684644 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

10 0.064621255 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

11 0.064577378 330 nips-2012-Supervised Learning with Similarity Functions

12 0.061208844 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

13 0.058433212 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.058262687 75 nips-2012-Collaborative Ranking With 17 Parameters

15 0.056397732 296 nips-2012-Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

16 0.054560468 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

17 0.054233298 165 nips-2012-Iterative ranking from pair-wise comparisons

18 0.050935641 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

19 0.049651068 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

20 0.048884723 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.129), (1, -0.058), (2, -0.011), (3, -0.008), (4, -0.053), (5, -0.041), (6, 0.002), (7, 0.075), (8, 0.017), (9, 0.079), (10, -0.112), (11, 0.017), (12, -0.04), (13, 0.018), (14, 0.032), (15, -0.003), (16, -0.013), (17, -0.003), (18, -0.026), (19, 0.044), (20, 0.028), (21, -0.004), (22, 0.01), (23, 0.032), (24, 0.04), (25, -0.004), (26, 0.048), (27, -0.022), (28, 0.099), (29, 0.085), (30, -0.005), (31, -0.03), (32, 0.004), (33, 0.005), (34, -0.004), (35, -0.001), (36, -0.002), (37, 0.017), (38, 0.04), (39, -0.007), (40, -0.014), (41, 0.049), (42, 0.025), (43, 0.04), (44, 0.03), (45, -0.03), (46, 0.075), (47, -0.131), (48, -0.047), (49, -0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93129367 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

2 0.66589791 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

3 0.62815207 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

4 0.6110155 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

5 0.52970177 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

6 0.49792129 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

7 0.49726501 60 nips-2012-Bayesian nonparametric models for ranked data

8 0.48663184 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

9 0.48257348 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

10 0.47889411 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.47690398 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

12 0.45883864 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

13 0.45472723 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

14 0.45117676 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

15 0.44455248 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

16 0.44451332 155 nips-2012-Human memory search as a random walk in a semantic network

17 0.43808106 204 nips-2012-MAP Inference in Chains using Column Generation

18 0.43571085 165 nips-2012-Iterative ranking from pair-wise comparisons

19 0.4349674 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

20 0.42364329 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (17, 0.017), (21, 0.026), (36, 0.323), (38, 0.098), (42, 0.019), (54, 0.042), (55, 0.029), (74, 0.042), (76, 0.176), (80, 0.074), (92, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90887171 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

2 0.81441098 295 nips-2012-Risk-Aversion in Multi-armed Bandits

Author: Amir Sani, Alessandro Lazaric, Rémi Munos

Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1

same-paper 3 0.80975997 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

4 0.67740643 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

5 0.67462182 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

6 0.66238981 69 nips-2012-Clustering Sparse Graphs

7 0.59567952 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

8 0.57187361 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

9 0.57034278 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

10 0.57028425 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

11 0.57014322 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

12 0.56897211 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

13 0.56842136 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

14 0.56826812 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

15 0.56647009 188 nips-2012-Learning from Distributions via Support Measure Machines

16 0.56457269 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

17 0.56329846 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

18 0.56260043 126 nips-2012-FastEx: Hash Clustering with Exponential Families

19 0.56236857 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

20 0.56210887 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes