The thing is, I see people use it for tricky niche knowledge all the time; using it as an alternative to doing a Google search.
So I want to have a general idea of how good it is at this.
I found something that was niche, but not super niche; I could easily find a good, human written answer in the top couple of results of a Google search.
But until now, all LLM answers I've gotten for it have been complete hallucinated gibberish.
Anyhow, this is a single data point, I need to expand my set of benchmark questions a bit now, but this is the first time that I've actually seen progress on this particular personal benchmark.
That’s riding hype machine and throwing baby with bath water.
Get an API and try to use it for classification of text or classification of images. Having an excel file with somewhat random looking 10k entries you want to classify or filter down to 10 important for you, use LLM.
Get it to make audio transcription. You can now just talk and it will make note for you on level that was not possible earlier without training on someone voice it can do anyone’s voice.
Fixing up text is of course also big.
Data classification is easy for LLM. Data transformation is a bit harder but still great. Creating new data is hard so like answering questions where it has to generate stuff from thin air it will hallucinate like a mad man.
The ones that LLMs are good in are used in background by people creating actual useful software on top of LLMs but those problems are not seen by general public who sees chat box.
I also use niche questions a lot but mostly to check how much the models tend to hallucinate. E.g. I start asking about rank badges in Star Trek which they usually get right and then I ask about specific (non existing) rank badges shaped like strawberries or something like that. Or I ask about smaller German cities and what's famous about them.
I know without the ability to search it's very unlikely the model actually has accurate "memories" about these things, I just hope one day they will acutally know that their "memory" is bad or non-existing and they will tell me so instead of hallucinating something.
Or more likely Google couldn't give a rat's arse whether those AI summaries are good or not (except to the degree that people don't flee it), and what it cares is that they keep users with Google itself, instead of clicking of to other sources.
After all it's the same search engine team that didn't care about its search results - it's main draw - activey going shit for over a decade.
Those summaries would be far more expensive to generate than the searches themselves so they're probably caching the top 100k most common or something, maybe even pre-caching it.
So I want to have a general idea of how good it is at this.
I found something that was niche, but not super niche; I could easily find a good, human written answer in the top couple of results of a Google search.
But until now, all LLM answers I've gotten for it have been complete hallucinated gibberish.
Anyhow, this is a single data point, I need to expand my set of benchmark questions a bit now, but this is the first time that I've actually seen progress on this particular personal benchmark.