Making a Science of the Ineffable
More than a year ago, Armen asked me "What's the next challenge for LLMs?" My answer was: 'Things that can't be specified in a few sentences.' This is still my answer, but it's a bad answer. Describing something negatively is lazy, it is really just gesturing vaguely and leaving it all as an exercise to the reader. Lame.
But it is still my answer, for a very simple reason: science has always operated by converting the ineffable into the effable. We invent new abstractions so that we can frame the universe we perceive with words. I think the recent popularity of Inventing Temperature is a recognition of the fact that we need to be inventing dozens of abstractions a year to catch-up with all that is happening.
We are very bad at inventing new abstractions that stick, because we have largely been trained to invent abstractions for papers and grants.
To be fair, there is also the genuine reality is that it is just very hard to decide how to bundle and name things when there are such vast swathes of novelty. No one knows why LLMs do what they do, or even if a satisfying theory is really possible. However, there is another reason we are bad a this: modern scientific culture is so obsessed with the ‘value’ of abstractions, that if an abstraction isn't immediately exploitable in the same way previous abstractions were, it will be thrown away in favor of an abstraction that will be easier to justify in a PowerPoint™️ presentation.
I didn't know how to tell Armen what I meant, because the science of generative modeling has consistently refused to acknowledge the notion that we are in the business of putting words to conceptual patterns that we didn't know about before. Instead, science about models is usually communicated in the form of 'this model shows signs of X thing we already understood.' This is wrong-headed, and since the world is changing so rapidly you can see it in action:
- People who would have argued all night about how 'alignment' is a meaningless concept a year ago, say they study 'how to align LLMs' on their home pages.
- 'Reasoning'—which everyone I've talked to still agrees is an unclear concept, is used when analyzing the outputs of LLMs as 'showing greater depth of reasoning' as if this is a robustly communicable concept.
- 'Generalization' as a concept no longer holds water as people don't know what the training set looks like, don't know what distributions they're testing on, or any of the rest—but try to describe that the model is making analogies we wouldn't expect.
The hardcore purists (most people who are loud in academia) would say we simply shouldn't be talking about these things at all, until we have rigorous definitions. This is a garbage take. How will we discover the future if we don't venture to describe new concepts before we define them? In the history of science, most important concepts are a vibe before they are a definition: heritability, temperature, complexity.
In spite of the purists and the hypemongers, we are slowly making progress defining such things as personas, hallucinations, and belief propagation in LLMs. So why am I complaining?
Because I think we are selecting for concepts that feel rigorous but are inherently hard to define. We are selecting ideas that make intuitive sense in the way we want to think about LLMs, rather than in the way LLMs actually work. It is not clear that LLMs distinguish between hallucinations and tail phenomena, indeed, it appears there are hard limits on this disentanglement. My guess is that most of the scientific terminology around LLMs is more projection than explanation.
How did we get here?
Peter convinced me this morning that the cultural reason is simple: everything one researcher says to another must be a value proposition or else it will be identified with something that is a value proposition and the original statement will be promptly forgotten. ‘Snap to grid’ for research hypotheses.
- If you tell people that LLMs may be memorizing strings that never occur altogether in the training set, they will lecture you about compositional generalization and be done with you.
- If you tell people that no one has studied the closure of LLMs under feedback loops, they will tell you about agents and chain-of-thought and feel very satisfied.
- If you tell people that LLMs are becoming a self-fulfilling prophecy of what people say about LLMs as we produce and train more documents with 'LLM' in them, people will just say this is Model Collapse and forget that you used the word hyperstition.
In the current discourse, there is simply no room for exploratory science. Instead, there are hypotheses and deployment. The secret language of generative models will not reveal itself to such biased analysis, but many people will have to stop making money for that to become less popular. For those who are interested in genuine exploration, now is the time to do deep thinking, so more interesting flowers can bloom in the coming years.