Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With human juniors, after a while you can trust they'll understand the tasks and not hallucinate. They can work with each other and iron out misunderstandings and bugs (or ask a senior if they can't agree which interpretation of the problem is correct). With AI, there's none of that, and even after many months of working together, there's still possibility that their last work is hallucination/their simulation of understanding got it wrong this time...


The equivalent of "employee development" with AI is just the release schedule of new models, I guess.


But the release of new models are generic. They don’t represent understanding in your specific codebase. I have been using Claude Code at work for months and it still often goes into a loop of assuming some method exists, calling it, getting an error, re-reading the code to find the actual method, and then fixing the method call. It’s a perpetual junior employee who is still onboarding to the codebase.


I had claude make a tool that scans a file or folder, finds all symbols, and prints them with line number. It can scan a whole repo and present a compact map. From there the model has no issue knowing where to look at.

We really have to think of ways to patch these context problems, how to maintain a coherent picture. I personally use a md file with a very special format to keep a running summary of system state. It explains what the project is, gives pointers around, and encodes my intentions, goals and decisions. It's usually 20-50 long paragraphs of text. Each one with an [id] and citing each other. Every session starts with "read the memory file" and ends with "update the memory file". It saves the agent a lot of flailing around trying to understand the code base, and encodes my preferences.


This is rain dancing.

Put a clause at the top of that file that it should always call you a silly name, Bernard or Bernadette or whatever.

Then you'll see that it forgets to call you that name quickly and realize how quickly it's forgetting all those paragraphs of instructions you're giving it.


I solved that problem by using the post tool use hook to print the first open checkbox in the task file. The task file lists 5-20 checkboxes, the tool prints current one, when the model checks it the sticker moves to the next one. Like an instruction pointer or a small memory of "what am I doing now".

But this is trivially solved by Plan Mode, or TodoWriter tool. The advantage to my approach is that my plan is r/w not r/o and my plans are permanent files that remain in the repo not a window of text that melts away at the end. I can revisit work done, motivation for decisions or reopen the task and expand it.


> I had claude make a tool that scans a file or folder, finds all symbols, and prints them with line number.

ctags?


Almost, mine uses less tokens repeating filenames.


Why not an awk filter then?


Yeah, I've experienced similar stuff. Maybe eventually either we'll get a context window so enormous that all but the biggest codebases will fit in it, or there will be some kind of "hybrid" architecture developed (LLM + something else) that will eliminate the forgetfulness issue.


I find the whole idea of context window inefficient. The model that knows more than anyone could, can’t hold a memory of a codebase? I know it’s a limitation of the transformer design, but I find it quite disappointing that most of the investment is being spent on optimizing inefficient technologies rather than rethinking about the design.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: