Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Currently, the goal of everyone is creating one master model to rule them all, so we haven't seen too much specialization. I wonder how much more efficient smaller models could be if we created language specialized models.

It feels intuitively obvious (so maybe wrong?) that a 32B Java Coder would be far better at coding Java than a generalist 32B Coder.



I’ll fill the role to push back on your Java coder idea!

First, Java code tends to be written a certain way, and for certain goals and business domains.

Let’s say 90% of modern Java is a mix of: * students learning to program and writing algorithms * corporate legacy software from non-tech focused companies

If you want to build something that is uncommon in that subset, it will likely struggle due to a lack of training data. And if you wanted to build something like a game, the majority of your training data is going to be based on ancient versions of Java, back when game development was more common in Java.

Comparatively, including C in your training data gives you exposure to a whole separate set of domain data for training, like IoT devices, kernels, etc.

Including Go will likely include a lot more networking and infrastructure code than Java would have had, which means there is also more context to pull from in what networking services expect.

Code for those domains follow different patterns, but the concepts can still be useful in writing Java code.

Now, there may be a middle ground where you could have a model that is still general for many coding languages, but given extra data and fine-tuning focused on domain-specific Java things — like more of a “32B CorporateJava Coder” model — based around the very specific architecture of Spring. And you’d be willing to accept that model to fail at writing games in Java.

It’s interesting to think about for sure - but I do feel like domain-specific might be more useful than language-specific


Don't we also find with natural languages that focusing on training data from only a single language doesn't actually result in better writing in the target language, either?


jetbrains have done this with their mellum models that they use for autocompletion, https://ollama.com/JetBrains

fine tuned rather than created from scratch though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: