Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."
You: "What could be improved?"
LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.
---
It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).
I would start with asking what kind of improvment, why, how, etc.
Or I could just start changing things to more closely line up with whatever textbook or "Clean Code" clone book I has last read, and hope it still passes the tests and that those tests are as thorough as possible.
The latter would: eventually get me fired, is stupid, and basically what the LLMs do.
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".