Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No ... try claude.ai or meta.ai (both behave the same) by asking them how many r's in the (made up) word ferrybridge. They'll both get it wrong and say 2.

Now ask them to spell ferrybridge. They both get it right.

gemini.google.com still fails on "strawberry" (the other two seem to have trained on that, which is why i used a made up word instead), but can correctly break it into a letter sequence if asked.



Yep, if by chance you hit a model that has seen the training data that happens to shove those tokens together in a way that it can guess, lucky you.

The point is, it would be trivial for an LLM to get it right all the time with character level tokenization. The reason LLMs using the current tokenization best tradeoff find this activity difficult is that the tokens that make up tree don't include the token for e.


No - you can give the LLM a list of letters and it STILL won't be able to count them reliably, so you are guessing wrong about where the difficult lies.

Try asking Claude: how many 'r's are in this list (just give me a number as your response, nothing else) : s t r a w b e r r y


How many examples like that do you think it's seen? You can't given an example of something that is in effect a trick to get character level tokenization and then expect it to do well when it's seen practically zero of such data in it's training set.

Nobody who suggests methods like character or byte level 'tokenization' suggests a model trained on current tokenization schemes should be able to do what you are suggesting. They are suggesting actually train it on characters or bytes.

You say all this as though I'm suggesting something novel. I'm not. Appealing to authority is kinda lame, but maybe see Andrej's take: https://x.com/karpathy/status/1657949234535211009


So, one final appeal to logic from me here:

1) You must have tested and realized that these models can spell just fine - break a word into a letter sequence, regardless of how you believe they are doing it

2) As shown above, even when presented with a word already broken into a sequence of letters, the model STILL fails to always correctly count the number of a given letter. You can argue about WHY they fail (different discussion), but regardless they do (if only allowed to output a number).

Now, "how many r's in strawberry", unless memorized, is accomplished by breaking it into a sequence of letters (which it can do fine), then counting the letters in the sequence (which it fails at).

So, you're still sticking to your belief that creating the letter sequence (which it can do fine) is the problem ?!!

Rhetorical question.


Tasks like reversing a list (Karpathy) or counting categories within in are far harder than simple prediction - the one thing LLMs are built to do.

Try it for yourself. Try it on a local model if you are paranoid that the cloud model is using a tool behind your back.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: