Thinking About Thinking

Rich Sutton, a prominent Canadian AI researcher, on a lesson learned from a career in AI:

We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

The Bitter Lesson

There is a fair amount of controversy over this point, but it contains a core of truth. We think we know how we think, and we think we can code it. But it is precisely the activities that we humans find so easy that has been difficult to replicate on computers.

The average human finds it trivial to parse objects from visual inputs, to recognize them easily when they are rotated, and to reach out and grasp them gently. And this lack of effort causes us to believe that it is easy. But these abilities are simply presented to us by our brains, punished and formed by millions of years of failure at these tasks. These tasks are not easy, but the difficulty has been hidden.

The bottom line is we don’t really understand how we think. This is counter-intuitive; after all we are in our own heads. But in some ways understanding the brain is like the eye that tries to look at itself. If the brain were simpler, it would be easier to understand. But then we’d be simpler as well.

It is somewhat irrational to believe that computers will find a truly different way to think than humans, at least if we expect them to complete tasks in our world. Evolution is a brutally efficient designer. But coding AIs to think like we think often fails, because we don’t know how we think. And coding AIs to contain knowledge that we know often fails, because we always know more than we can code. The only approach that scales is the machine learning by itself.