Prediction vs thought in GPT-2

GPT-2 is a neural network model originally developed by OpenAI and based on Google’s Transformer model. It has demonstrated impressive, even fantastic performance, on a variety of language metrics. But is it in any way embodying real intelligence? The Gradient’s Gary Marcus says no.

Here’s the problem: upon careful inspection, it becomes apparent the system has no idea what it is talking about: it cannot follow simple sequence of events nor reliably have any idea what might happen next.

GPT-2 and the Nature of Intelligence

Marcus cites some impressive prediction results, such as the following where the GPT-2 predicted text follows the bolded part:

I grew up in Rome I speak fluent Italian and can use Italian throughout my notes.

I grew up in Rio de Janeiro. I speak fluent Portuguese, Spanish and many of the local languages.

I grew up in Shanghai. I speak fluent Mandarin, and I have learned a lot of Chinese, so I really don’t have any issues with learning Chinese

Nevertheless, GPT-2 is unable to follow basic causality:

Suppose for example that I ask the system what happens when I put two trophies a table and another:

I put two trophies on a table, and then add another, the total number is is five trophies and I’m like, ‘Well, I can live with that, right?

What happens if I have four plates and put one cookie on each?

I have four plates and put one cookie on each. The total number of cookies is 24, 5 as a topping and 2 as the filling

Marcus suggests that these results support the view that GPT-2 is simply very good at predicting which words will follow other words. It does not actually understand what it is predicting. And this failure to achieve any kind of understanding despite a huge training set of 40 gigabytes of text suggests that true language understanding might not be the result of empirical experience or statistics after all.

One of the most foundational claims of Chomskyan linguistics has been that sentences are represented as tree structures, and that children were born knowing (unconsciously) that sentences should be represented by means of such trees. Every linguistic class in the 1980’s and 1990s was filled with analyses of syntactic tree structures; GPT-2 has none.

. . . . .

Rather than supporting the Lockean, blank-slate view, GPT-2 appears to be an accidental counter-evidence to that view. Likewise, it doesn’t seem like great news for the symbol-free thought-vector view, either. Vector-based systems like GPT-2 can predict word categories, but they don’t really embody thoughts in a reliable enough way to be useful.

If GPT-2 is fundamentally prediction, and prediction is fundamentally not understanding, how far will this road take us? Would we ever be able to rely on a GPT-2-like model for critical tasks? It may be that common sense is always just out of reach.