To be more charitable to TFA, machine translation is a field where there aren't great alternatives and the downside is pretty limited. If something is in another language you don't read it at all. You can translate a bunch of documents and benchmark the result and demonstrate that the model doesn't completely change simple sentences. Another related area is OCR - there are sometimes mistakes, but it's tractable to create a model and verify it's mostly correct.
LLMs being applied to everything under the sun feels like we're solving problems that have other solutions, and the answers aren't necessarily correct or accurate. I don't need a dubiously accurate summary of an article in English, I can read and comprehend it just fine. The downside is real and the utility is limited.
There's an older tradition of rule-based machine translation. In these methods, someone really does understand exactly what the program does, in a detailed way; it's designed like other programs, according to someone's explicit understanding. There's still active research in this field; I have a friend who's very deep into it.
The trouble is that statistical MT (the things that became neural net MT) started achieving better quality metrics than rule-based MT sometime around 2008 or 2010 (if I remember correctly), and the distance between them has widened since then. Rule-based systems have gotten a little better each year, while statistical systems have gotten a lot better each year, and are also now receiving correspondingly much more investment.
The statistical systems are especially good at using context to disambiguate linguistic ambiguities. When a word has multiple meanings, human beings guess which one is relevant from overall context (merging evidence upwards and downwards from multiple layers within the language understanding process!). Statistical MT systems seem to do something somewhat similar. Much as human beings don't even perceive how we knew which meaning was relevant (but we usually guessed the right one without even thinking about it), these systems usually also guess the right one using highly contextual evidence.
Linguistic example sentences like "time flies like an arrow" (my linguistics professor suggested "I can't wait for her to take me here") are formally susceptible of many different interpretations, each of which can be considered correct, but when we see or hear such sentences within a larger context, we somehow tend to know which interpretation is most relevant and so most plausible. We might never be able to replicate some of that with consciously-engineered rulesets!
It's bitter for me because I like looking at how things work under the hood and that's much less satisfying when it's "a bunch of stats and linear algebra that just happens to work"
> There's an older tradition of rule-based machine translation. In these methods, someone really does understand exactly what the program does, in a detailed way
I would softly disagree with this. Technically, we also understand exactly what a LLM does, we can analyze every instruction that is executed. Nothing is hidden from us. We don't always know what the outcome will be; but, we also don't always know what the outcome will be in rule-based models, if we make the chain of logic too deep to reliably predict. There is a difference, but it is on a spectrum. In other words, explicit code may help but it does not guarantee understanding, because nothing does and nothing can.
LLMs are great because of exactly that: they solve things that have no other solutions.
(And also things that have other solutions, but where "find and apply that other solution" has way more overhead than "just ask an LLM".)
There is no deterministic way to "summarize this research paper, then evaluate whether the findings are relevant and significant for this thing I'm doing right now", or "crawl this poorly documented codebase, tell me what this module does". And the alternative is sinking your own time in it - while you could be doing something more important or more fun.
and demonstrate that the model doesn't completely change simple sentences
A nefarious model would work that way though. The owner wouldn't want it to be obvious. It'd only change the meaning of some sentences some of the time, but enough to nudge the user's understanding of the translated text to something that the model owner wants.
For example, imagine a model that detects the sentiment of text about Russian military action, and automatically translates it to something a more positive if it's especially negative, but only 20% of the time (maybe ramping up as the model ages). A user wouldn't know, and a someone testing the model for accuracy might assume it's just a poor translation. If such a model became popular it could easily shift the perception of the public a few percent in the owner's preferred direction. That'd be plenty to change world politics.
Likewise for a model translating contracts, or laws, or anything else where the language is complex and requires knowledge of both the language and the domain. Imagine a Chinese model that detects someone trying to translate a contract from Chinese to English, and deliberately modifies any clause about data privacy to change it to be more acceptable. That might be paranoia on my part, but it's entirely possible on a technical level.
That's not a technical problem though is it? I don't see legal scenarios where unverified machine translation is acceptable - you need to get a certified translator to sign off on any translations and I also don't see how changing that would be a good thing.
I think the point here is that, while such a translation wouldn't be admissible in court, many of us already used machine translation to read some legal agreement in a language we don't know.
LLMs being applied to everything under the sun feels like we're solving problems that have other solutions, and the answers aren't necessarily correct or accurate. I don't need a dubiously accurate summary of an article in English, I can read and comprehend it just fine. The downside is real and the utility is limited.