ChatGPT is such a strange technology, because it simultaneously makes me want to point out how limited it is, but also gives me the feeling that there may be something deeper here.
I’ve spent a bit of time studying language models, and so far it just seems that GPT predicts the next word given some previous words as context. By applying a pretty basic approach but using a vast, almost unimaginable amount of data, it manages to write plausible looking prose that gives you the impression of intelligence. In reality, it’s essentially compressing a very large dataset and sampling from that set in a way that hides its memorisation. So it can be easy to dismiss from a theoretical point of view. What does amaze me is the curation and utilisation of the dataset itself. It’s just an incredible feat, and there must be so much more to that pipeline than that we know. But still, it doesn’t reason, it’s liable to be wildly inaccurate without warning, and there is no way I can see to fix this approach. If I had to guess, I’d say it will be a dead end - for now - but in the long term a relative of our current language models will fit into a larger cohesive system.
But… what gets me is, well, how do you know humans don’t just use such a simple approach? As Sam Altman puts it - how do you know we’re not just stochastic parrots? Ultimately we hear a lot of spoken language and anecdotally we have these repeating phrases we use, there are the linguistic equivalent of ear worms that clearly meme their way around the world - I hear the phrase “betting the farm”, something I hadn’t heard until a few years ago, I start using it and I see someone for the first time in a while and notice they also use it. Our models are being extended based on our input.
There’s clearly a difference between us and GPT, because when I write I am intentionally conveying meaning, and usually I work hard to avoid making statements that are false. But what this tool is making me consider is - perhaps this metacognition, this reasoning and fact checking, is sitting either “on top of” or “inside of” the language model. From meditating, it seems that I have feelings and those feelings perhaps lead to a sample from my internal language model. Is it the case that I then censor the strings that contain untruths or for other reasons should not be expressed? Do I take the output and then use it in combination with some other mental faculty to edit the string, perhaps pass it back into the language model? “Well, what I really meant there was X, please fix it.”
In this way I think the GPT model itself is actually theoretically fairly trivial and uninteresting, but the _observation that a simple predictive model combined with a gargantuan dataset can achieve so much_ is absolutely astounding. It may be the case that GPT is mostly useless in a practical sense, but in a philosophical or research sense it may prove to be a pivotal moment in our understanding of how the mind works.
I’ve spent a bit of time studying language models, and so far it just seems that GPT predicts the next word given some previous words as context. By applying a pretty basic approach but using a vast, almost unimaginable amount of data, it manages to write plausible looking prose that gives you the impression of intelligence. In reality, it’s essentially compressing a very large dataset and sampling from that set in a way that hides its memorisation. So it can be easy to dismiss from a theoretical point of view. What does amaze me is the curation and utilisation of the dataset itself. It’s just an incredible feat, and there must be so much more to that pipeline than that we know. But still, it doesn’t reason, it’s liable to be wildly inaccurate without warning, and there is no way I can see to fix this approach. If I had to guess, I’d say it will be a dead end - for now - but in the long term a relative of our current language models will fit into a larger cohesive system.
But… what gets me is, well, how do you know humans don’t just use such a simple approach? As Sam Altman puts it - how do you know we’re not just stochastic parrots? Ultimately we hear a lot of spoken language and anecdotally we have these repeating phrases we use, there are the linguistic equivalent of ear worms that clearly meme their way around the world - I hear the phrase “betting the farm”, something I hadn’t heard until a few years ago, I start using it and I see someone for the first time in a while and notice they also use it. Our models are being extended based on our input.
There’s clearly a difference between us and GPT, because when I write I am intentionally conveying meaning, and usually I work hard to avoid making statements that are false. But what this tool is making me consider is - perhaps this metacognition, this reasoning and fact checking, is sitting either “on top of” or “inside of” the language model. From meditating, it seems that I have feelings and those feelings perhaps lead to a sample from my internal language model. Is it the case that I then censor the strings that contain untruths or for other reasons should not be expressed? Do I take the output and then use it in combination with some other mental faculty to edit the string, perhaps pass it back into the language model? “Well, what I really meant there was X, please fix it.”
In this way I think the GPT model itself is actually theoretically fairly trivial and uninteresting, but the _observation that a simple predictive model combined with a gargantuan dataset can achieve so much_ is absolutely astounding. It may be the case that GPT is mostly useless in a practical sense, but in a philosophical or research sense it may prove to be a pivotal moment in our understanding of how the mind works.