Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because code isn't free.

I can see it in my team. We've all been using Claude a lot for the last 6 months. It's hard to measure the impact, but I can tell our systems are as buggy as ever. AI isn't a silver bullet.

 help



And after 12 months, most probably no one from your team will understand what the result of half of those bugs is.

When devs outsource their thinking to AI, they lose the mental map, and without it, control over the entire system.


I think about this a lot, and do everything I can to avoid having Claude write production code while keeping the expected tempo up. To date, this has mostly ended up having me use it to write project plans, generate walkthroughs, and write unit and integration tests. The terrifying scenario for me is getting paged and then not being able to actually reason about what is happening.

Anything bigger in context? Unfortunately - maybe I have bad luck…

But I don’t get how they code in Anthropic when they say that almost all their new code is written by LLM.

Do they have some internal much smarter model that they keep in secret and don’t sell it to customers? :)


>> when they say that almost all their new code is written by LLM.

Kepping in mind they are trying hard to sell their code assistant what else they can say?

Goal is simple: just lie your way forward to the next VC funding round.


I find that writing good tests is my ticket to understanding the problem in depth, be careful about outsourcing that part. Plus from what I have seen LLM generated tests are often low quality.

I find this such a weird stance to take. Every system I work on and bug I fix has broad sets of code that I didn't write in it. Often I didn't write any of the code I am debugging. You have to be able to build a mental map as you go even without ai.

Yeah. Everyone sort of assumes that not having personally written the code means they can’t debug it.

When is the last time you had an on call blow up that was actually your code?

Not that I’m some savant of code writing — but for me, pretty much never. It’s always something I’ve never touched that blows up on my Saturday night when I’m on call. Turns out it doesn’t really change much if it’s Sam who wrote it … or Claude.


The problem is you lose abilities if stop writing code completely.

There is a difference between a lector and an author


"hey coworker, I know your team wrote this, can you help?" Except there is no coworker, just Claude

Do you know what on call means?

It means Sam is 7 beers deep on Saturday night since you’re the one on call. He’s not responding to your slack messages.

Claude actually is there though, so that’s kind of nice.


Sam might be 7 beers deep, or maybe he's available. In my org, oncall is just who gets the 2am phone call. They can try to contact anyone else if needed.

Claude is there as long as you're paying,and I hope he doesn't hallucinate an answer.


> In my org, oncall is just who gets the 2am phone call. They can *try* to contact anyone else if needed.

Emphasis mine.

> Claude is there as long as you're paying

If you’re at a company that doesn’t pay for AI in the year 2026, you should find a new company.

> and I hope he doesn't hallucinate an answer.

Unlike human coworkers with a 100% success rate, naturally.


"Yeah our team wrote it but everyone who built that part of it has moved to different teams or companies since."

Yeah it happens, and it's not ideal, and now instead of a risk, it's a guarantee.

Yeah but now you get an LLM to help you understand the code base 100x faster.

Remember, they're not just good for writing code. They're amazing at reading code and explaining to you how the architecture works, the main design decisions, how the files fit together, etc.


Because it's remarkably easier to write bugs in a code base you know nothing about, and we usually try to prevent bugs entirely, not debug them after they are found. The whole premise of what you're saying is dependent on knowing bugs exist before they hit Prod. I inherit people's legacy apps. That almost never happens.

In sufficiently complicated systems, the 10xer who knows nothing about the edge cases of state could do a lot more damage than an okay developer who knows all the gotchas. That's why someone departing a project is such a huge blow.


Usually all code has an owner though. If I encounter a bug the first thing I often do is look at git blame and see who wrote the code then ask them for help.

You are missing the point.

It’s a difference reading code if you’re are also a writer of than purely a reader.

It’s like only reading/listening to foreign language without ever writing/speaking it.


When you work on a pre-existing codebase, you don't understand the code yet, but presumably somebody understood parts of it while building it. When you use AI to generate code, you guarantee that no one has ever understood the code being summoned. Don't ignore this difference.

I agree, but you don't have to outsource your thinking to AI in order to benefit from AI.

Use AI as a sanity check on your thinking. Use it to search for bugs. Use it to fill in the holes in your knowledge. Use it to automate grunt work, free your mind and increase your focus.

There are so many ways that AI can be beneficial while staying in full control.

I went through an experimental period of using Claude for everything. It's fun but ultimately the code it generates is garbage. I'm back to hand writing 90% of code (not including autocomplete).

You can still find effective ways to use this technology while keeping in mind its limitations.


The better the code is, the less detailed a mental map is required. It's a bad sign if you need too much deep knowledge of multiple subsystems and their implementation details to fix one bug without breaking everything. Conversely, if drive-by contributors can quickly figure out a bug they're facing and write a fix by only examining the place it happens with minimal global context, you've succeeded at keeping your code loosely-coupled with clear naming and minimal surprises.

100% agree. I’ve seen it with my own sessions with code agents. You gain speed in the beginning but lose all context on the implementation which forces you to use agents more.

It’s easy to see the immediate speed boost, it’s much harder to see how much worse maintaining this code will be over time.

What happens when everyone in a meeting about implementing a feature has to say “I don’t know we need to consult CC”. That has a negative impact on planning and coordination.


Only if they are supremely lazy. It’s possible to use these tools in a diligent way, where you maintain understanding and control of the system but outsource the implementation of tasks to the LLM.

An engineer should be code reviewing every line written by an LLM, in the same way that every line is normally code reviewed when written by a human.

Maybe this changes the original argument from software being “free”, but we could just change that to mean “super cheap”.


There's a pretty big difference between the understanding that comes with reviewing code versus writing it, for most people I think.

Definitely true for me. What’s particularly problematic is code I need to review but can’t effectively test due to environmental challenges.

Thats a tough situation. How do you handle the testing with human code?

Just as poorly.

> An engineer should be code reviewing every line written by an LLM,

I disagree.

Instead, a human should be reviewing the LLM generated unit tests to ensure that they test for the right thing. Beyond that, YOLO.

If your architecture makes testing hard build a better one. If your tests arent good enough make the AI write better ones.


The venn diagram for "bad things an LLM could decide are a good idea" and "things you'll think to check that it tests for" has very little overlap. The first circle includes, roughly, every possible action. And the second is tiny.

Just read the code.


There’s no way you or the AI wrote tests to cover everything you care about.

If you did, the tests would be at least as complicated as the code (almost certainly much more so), so looking at the tests isn’t meaningfully easier than looking at the code.

If you didn’t, any functionality you didn’t test is subject to change every time the AI does any work at all.

As long as AIs are either non-deterministic or chaotic (suffer from prompt instability, the code is the spec. Non determinism is probably solvable, but prompt instability is a much harder problem.


> As long as AIs are either non-deterministic or chaotic

You just hit the nail on the head.

LLM's are stochastic. We want deterministic code. The way you do that is with is by bolting on deterministic linting, unit tests, AST pattern checks, etc. You can transform it into a deterministic system by validating and constraining output.

One day we will look back on the days before we validated output the same way we now look at ancient code that didn't validate input.


None of those things make it deterministic though. And they certainly don’t make it non-chaotic.

You can have all the validation, linters, and unit tests you want and a one word change to your prompt will produce a program that is 90%+ different.

You could theoretically test every single possible thing that an outside observer could observe, and the code being different wouldn’t matter, but then your tests would be 100x longer than the code.


> None of those things make it deterministic though.

In the information theoretical sense you're correct, of course. I mean it's a variation on the halting problem so there will never be any guarantee of bug free code. Heck, the same is true of human code and it's foibles. However, in the "does it work or not" sense I'm not sure why we care?

If the gate only passes the digits 0-9 sent within 'x' seconds, and the code's job is to send a digit between 0 and 9, how is it non-deterministic?

Let's say the linter says it's good, it passes the regression tests, you've validated that it only outputs what it's supposed to and does it in a reasonable amount of time, and maybe you're even super paranoid so you ran it through some mutation tests just to be sure that invalid inputs didn't lead to unacceptable outputs. How can it really be non-deterministic after all that? I get that it could still be doing some 'other stuff' in the background, or doing it inefficiently, but if we care about that we just add more tests for that.

I suppose there's the impossible problem edge case. IE - You might never get an answer that works, and satisfies all constraints. It's happened to me with vibe-coding several times and once resulted in the agent tearing up my codebase, so I learned to include an escape hatch for when it's stuck between constraints ("email user123@corpo.com if stuck for 'x' turns then halt"). Now it just emails me and waits for further instruction.

To me, perfect is the enemy of good and good is mostly good enough.


> If the gate only passes the digits 0-9 sent within 'x' seconds, and the code's job is to send a digit between 0 and 9, how is it non-deterministic?

If that’s all the code does, sure you could specify every observable behavior.

In reality though there are tens of thousands of “design decisions” that a programmer or LLM is gonna to make when translating a high level spec into code. Many of those decisions aren’t even things you’d care about, but users will notice the cumulative impact of them constantly flipping.

In a real world application where you have thousands of requirements and features interacting with each other, you can’t realistically specify enough of the observable behavior to keep it from turning into a sloshy mess of shifting jank without reviewing and understanding the actual spec, which is the code.


It’s amazing how often an LLM mocks or stubs some code and then writes a test that only checks the mock, which ends up testing nothing.

I have seen junior engineers do this on multiple occasions. This is why all code should be reviewed by experienced engineers, whether written by a human or an LLM.

You really do have to verify and validate the tests. Worse you have to constantly battle the thing trying to cheat at the tests or bypass them completely.

But once you figure that out, it's pretty effective.


The majority of devs I meet are extremely lazy. It’s why so many people are outsourcing their jobs to Claude.

Don't they eventually become managers and tech leads anyway and outsource to their staff?

I'm reminded of the viral comic "I'm stupid faster" (2019?) by Shen

https://imgur.com/gallery/i-m-stupid-faster-u8crXcq

(sorry for Imgur link, but Shen's web presence is a mess and it's hard to find a canonical source)

I'm not saying this is completely the case for AI coding agents, whose capabilities and trustworthiness have seen a meteoric rise in the past year.


I love the fact that we just got a model really capable of doing sustained coding (let me check my notes here...) 3 months ago, with a significant bump 15 days ago.

And now the comments are "If it is so great why isn't everything already written from scratch with it?"


I feel like people have been saying AI was great for years now?

Ah, so it's free, but you still have to wait 3 months. Just a question...what are you waiting for?

Of course the answer is all the things that aren't free, refinement, testing, bug fixes, etc, like the parent post and the article suggested.


Well the company keeps saying coding is a solved problem.

And you presume they are being completely honest about their capabilities why?

People are getting caught up in the "fast (but slow) diffusion)" that Dario has spoken to. Adoption of these tools has been fast but not instant but people will poke holes via "well, it hasn't done x yet".

For my own work I've focused on using the agents to help clean up our CICD and make it more robust, specifically because the rest of the company is using agents more broadly. Seems like a way to leverage the technology in a non-slop oriented way


Why isn't Claude doing QA testing for you?

Why isn't it doing it for Anthropic ?

What makes you think it isn't?

They just have a lot of users doing QA to, and ignore any of their issues like true champs


I can't tell if this is sarcasm, but if not, you cant rely on the thing that produced invalid output to validate it's own output. That is fundementally insufficient, despite it potentially catching some errors.

Damn. Guess I'll stop QAing my own work from now.

This but unironically. Of course review your own work. But QA is best done by people other than those who develop the product. Having another set of eyes to check your work is as old as science.

That is often how software development has been done the past several decades yea...

Not to say that you don't review your own work, but it's good practice for others (or at least one other person) to review it/QA it as well.


You're making a false equivalence between a human being with agency and intelligence, and a machine.

Are humans not machines?

That’s something that more than half of humans would disagree with (exact numbers vary but most polls show that more than 75% of people globally believe that humans have a soul or spirit).

But ignoring that, if humans are machines, they are sufficiently advanced machines that we have only a very modest understanding of and no way to replicate. Our understanding of ourselves is so limited that we might as well be magic.


>if humans are machines, they are sufficiently advanced machines that we have only a very modest understanding of and no way to replicate

Well, ignoring the whole literal replication thing humans do.


Obviously by replicate I meant building a synthetic human.

So good we're magic. So bad we think we're magic.

Obviously not? Like how is that a serious question...

Yes. That’s not a best practice. That’s why PRs and peer reviews and test automation suite exist.

I think it is common for one to write their own tests tho

He said QA. QA is more than just unit tests.

Whatever level of automated testing, it’s all usually done by the same people who wrote the software to begin with

I mean there is some wisdom to that, most teams separate dev and qa and writers aren't their own editors precisely because it's hard for the author of a thing to spot their own mistakes.

When you merge them into one it's usually a cost saving measure accepting that quality control will take a hit.


Yeah, someone should invent code review.

Uh, yeah, thh hi is has been considered bad practice for decades.

> you cant rely on the thing that produced invalid output to validate it's own output

I've been coding an app with the help of AI. At first it created some pretty awful unit tests and then over time, as more tests were created, it got better and better at creating tests. What I noticed was that AI would use the context from the tests to create valid output. When I'd find bugs it created, and have AI fix the bugs (with more tests), it would then do it the right way. So it actually was validating the invalid output because it could rely on other behaviors in the tests to find its own issues.

The project is now at the point that I've pretty much stopped writing the tests myself. I'm sure it isn't perfect, but it feels pretty comprehensive at 693 tests. Feel free to look at the code yourself [0].

[0] https://github.com/OrangeJuiceExtension/OrangeJuice/actions/...


I'm not saying you can't do it, I'm just saying it's not sufficient on its own. I run my code through an LLM and it occasionally catches stuff I missed.

Thanks for the clarification. That's the difference though, I don't need it to catch stuff I missed, I catch stuff it misses and I tell it to add it, which it dutifully does.

What if "the thing" is a human and another human validating the output. Is that its own output (= that of a human) or not? Doesn't this apply to LLMs - you do not review the code within the same session that you used to generate the code?

I think a human and an LLM are fundamentally different things, so no. Otherwise you could make the argument that only something extra-terrestrial could validate our work, since LLM's like all machines are also our outputs.

The problem now is that it’s a human using Claude to write the code and another using Claude to review it.

I have had other LLMs QA the work of Claude Code and they find bugs. It's a good cycle, but the bugs almost never get fixed in one-shot without causing chaos in the codebase or vast swaths of rewritten code for no reason.

Products don't have to be perfect. If they can be less buggy than before AI. You can't call that anything but a win.

I can't tell if that is sarcasm. Of course you can use the same model to write tests. That's a different problem altogether, with a different series of prompts altogether!

When it comes to code review, though, it can be a good idea to pit multiple models against each other. I've relied on that trick from day 1.


That's why you get Codex to do it. /s

Dude, I blame all bugs on ai at this point. I suspect one could roughly identify AI’s entry into the game based on some metric of large system outages. Assume someone has already done this but…probably doesn’t matter.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: