Intrinsically motivated reinforcement learning

   

These notes are from Andy Barto’s talk on intrinsic motivation.

  • Motivation: Forces that energize an organism to act direct its activity
    • Extrinsic Motivation: Motivation based on external reward like money, a prize, some kind of biological value
    • Intrinsic Motivation: Motivation based on the inherent pleasure experienced while doing an activity
      • Exploration, manipulation, play, learning…
  • The idea of intrinsic motivation originated with Harry Harlow in the 1950’s.
    • Harlow performed experiments with monkeys
    • He observed that monkeys will manipulate mechanical puzzles for hours on end, even while ignoring food
    • Harlow believed intrinsic motivation to be as important as the homeostatic drives
    • Intrinsic motivation is probably not learned in animals
  • Ludic behavior - any behavior that doesn’t have a biological function but which is clearly recognizable
  • Epistemic behavior - behavior that augments knowledge
  • Characteristics of intrinsic motivation
    • Not accounted for by biological drives
    • Not learned, but built-in
    • Intrinsic reward is primary
      • Extrinsic motivation that leads to external rewards needs to be learned, and is thus secondary
    • Not used to solve a specific task
    • Perhaps used to augment knowledge and skills that have biological utility later in life
  • What is intrinsically rewarding about certain activities?
    • novelty, surprise, incongruity
    • mastery/control
  • Motivation from the RL point of view
    • rewards - objects or events in the environment that modify behavior (come back for more)
    • reward signal - neural signals that influence brain activity
    • motivation is about maximizing reward
    • motivation = gradient of value function
  • Computational curiosity
    • Put forth by Schmidhuber in the 1990’s
    • “The direct goal of curiosity and boredom is to improve a world model…“
    • Agent gets positive reward when it fails to predict the environment correctly (surprise!)
      • This leads to exploration by avoiding things the agent already knows about i.e. boredom
  • Exploration bonus - additional reward for visiting states that haven’t been experienced in a long time
  • Look into the options framework
  • In addition to psychological evidence, there is also neurological data that corroborates the intrinsic reward hypothesis
    • TD Error is related to dopamine signaling