Is (Deep) Reinforcement Learning Barking Up The Wrong Tree


These are some notes that I jotted down quickly when watching Chris Atkeson’s guest lecture in CMU’s Deep RL class. See here(part 1, part 2) for the original video.

The big question

Are the things deep reinforcement learning are likely to provide anything like the kinds of things we have to do to make the robots we work with actually do tasks?

  • Deep RL seems to be about representing complexity
  • Atkeson’s Ideology: Learning = Optimization
    • I need to think more about the tradeoffs between function approximation (learning, optimal control, etc) and control synthesis based on geometry, dynamics, and experimentation
  • Behaviors learned on one robot don’t necessarily work on identical robots
    • This suggest that model-based approaches aren’t enough
  • Hierarchical layers of control for DARPA Robotics challenge
    • Level 1: Use A* to solves the foot placement problem
    • Level 2: Use trajectory optimization to plan a path for the COM
    • Level 3: RHC
    • Level 4: Full body control
    • Notice that simpler models are used to plan further into the future
  • The quest for the perfect model is doomed
    • Controlling Atlas is hard because it’s actually very difficult to locate the center of mass within 2cm.
    • Structures bend
    • Joints have backlash and sometimes they have play
    • Essentially things deform
  • Why not learn a model??
    • COM changes drastically for humans running
    • It’s hard or near impossible to completely model the human body in terms of positions, velocities, angular velocities, and torques
  • How do you tell your optimizer that you are unsure about your model?
    • Atkeson likes policy gradient for learning policies
    • Learning a parameterized policy rather than optimizing a single trajectory
    • My question: Why is random search a good strategy?
    • How do learn controllers and not overfit the model
    • Adding noise isn’t the right way to train a model because the error due to noise doesn’t propagate.
      • Is this due to noise being random? Random noise is not correlated.
    • If you mis-estimate the mass or there’s a delay, then the errors will correlate over time
  • How to find the easy way for robots to do things
    • For example, use hands to balance
    • A large part of human learning is from demonstration
    • LFD works well if the teacher and student have the same shape and kinematic structure
    • Cooking challenge
      • This will force robots to learn generalizable skills
    • Many things are easy to do with robots if state estimation is easy
      • Pixels to torques might be hard because we haven’t fully solved computer vision yet


What is a low-pass filter? What is bandwidth? Is this how we model actuators?