The music programming language Csound is a good test.
There is not a huge amount of csound code online, but what is online is going to work.
The models are just terrible with csound. They can not mix and match at all. They can't abstract at all.
An even better test is to have the LLM try to make a piece of music in a certain style in csound. That is when you can see that the model is not abstracting the concepts at all.
Zero. none. nada.
Of course, it gives the probablistic next token prediction but because none of this is in the training data, the most probable output is basically noise/nonsense.
In general, I think we highly underestimate how hard it is to come up with things that are not in the training data.
The music programming language Csound is a good test.
There is not a huge amount of csound code online, but what is online is going to work.
The models are just terrible with csound. They can not mix and match at all. They can't abstract at all.
An even better test is to have the LLM try to make a piece of music in a certain style in csound. That is when you can see that the model is not abstracting the concepts at all.
Zero. none. nada.
Of course, it gives the probablistic next token prediction but because none of this is in the training data, the most probable output is basically noise/nonsense.
In general, I think we highly underestimate how hard it is to come up with things that are not in the training data.