https://twimlai.com/dex-net-and-the-third-wave-of-robot-learning/
According to Ken, there are three fundamental elements of uncertainty that make robot grasping extremely difficult:
Perception. Understanding the precise geometry of where everything is in a scene can be a complex task. There have been developments in depth sensors like LIDAR, “but they still don’t completely solve this problem because if there’s anything reflective or transparent on the surface, that causes the light to react in unpredictable ways, it doesn’t register as a correct position of where that surface really is.” Adding additional sensors doesn’t help much because they often create contradictions, “[the agent] doesn’t know what to trust” in order to act correctly. Perception is especially important in grasping because “a millimeter or less can make the difference between holding something and dropping it.”
Control. The robot has to maintain control of its grasp meaning, “The robot has to now get its gripper to the precise position in space, consistent with what it believes is happening from its sensors.” If the gripper moves slightly or holds it too tight, the object can drop or break.
Physics. This has to do with choosing the right place to grasp the object, understanding friction and mass are significant unknowns. To demonstrate how difficult this is, Ken gives the example of pushing a pencil across the table with your finger. We can estimate the pencil’s center of mass, but we ultimately do not know the frictional properties at play. It’s almost impossible to predict the trajectory because even “one microscopic grain of sand, anything under there is going to cause it to behave extremely differently.”
https://berkeleyautomation.github.io/dex-net/
The first wave is the “classic physics” approach which prioritizes traditional understandings of physics in terms of forces, and torques, friction, mass — all that good stuff. The second wave is the more modern, “data-driven approaches that say: ‘Forget about the physics, let’s just learn it from observation purely’” and assume the physics will be learned naturally in the process.
Then there’s what Ken advocates for, which is the third wave of robot learning that combines the two fields of thought. The goal is to synthesize the knowledge from both perspectives to optimize performance. However, “figuring out where that combination is is the challenge. And that’s really the story of Dex-Net.”