Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tenstorrent and the State of AI Hardware Startups (irrationalanalysis.substack.com)
224 points by zdw on Dec 15, 2024 | hide | past | favorite | 124 comments


I have a business renting high performance compute. I want to democratize compute to make it more easily available on short term basis to anyone who wants access. I talked to TT about a year ago; also one of their customers as well. I was really impressed with everyone I talked to and I'd love to work with them.

What I realized though is that as much as I'd like to buy and host it and make it available, I'm not sure the economics or interest are there yet. The focus today is so dedicated to Nvidia that "fringe" gear is still just that. People who really want to play with it, will just buy it themselves. They've probably been doing that with hardware for a long time.

So, it is a bit of a catch-22 for me right now. Hopefully interest grows in these sorts of offerings and demand will pick up. The world needs alternatives to just Nvidia dominance of all of AI hardware and software.


> Hopefully interest grows in these sorts of offerings and demand will pick up.

Well, looking at their (as far as I can see highest end) accelerator the n300s we get:

- 24GB of memory

- 576GB/s of memory bandwidth

- $1400

As a hobbyist this is still not compelling enough to get excited and port my software - same amount of memory as a 4090/3090, half the bandwidth as a 4090/3090, slightly cheaper (than a 4090), more expensive (than a 3090), much worse software support. Why would I buy it over NVidia? This might be more compelling to bigger fish customers who would buy thousands of these (so then the lower price makes a difference), but you really need small fish people to use it too if you want to achieve good, widespread software support.

However if they'd at least double the amount of memory at the same price, now we'd be talking...


Yeah, clearly the low-hanging fruit is simply to offer more memory.

In a recent thread, just about everybody was talking about how Intel should have released its "Battlemage" line of cards with 48GB+ and how they'd actually have a very compelling offering on their hands had they done so. https://news.ycombinator.com/item?id=42308590

Missed opportunities.

That said, I know some of the Tenstorrent guys and they're extremely smart and capable. They'll figure it out, and the company is probably going to 10x.


> Yeah, clearly the low-hanging fruit is simply to offer more memory.

Yep. And just to be clear, this isn't necessarily a strategy to directly make money, as the market for people running these things locally is probably not very big (as a lot of people in the Intel thread said). But it's a strategy to beat the CUDA monopoly and get everyone and their dog to support your hardware with their software. I know I would be porting my software to their hardware if it was actually compelling for small fish, and I know plenty of other people who would too.

AMD also somewhat falls into this trap. They're slightly cheaper than NVidia, and their hardware is roughly good enough, but because their software stack and their hardware support sucks (just look at their ROCm compatibility list, it's a complete joke with, like, 3 models of GPUs, vs NVidia where I can use any GPU from the last 8 years) no one bothers. Being good enough is not good enough, you need to offer some sort of a unique selling point for people to tolerate the extra software issues they'll have with your hardware.


If someone built a mediocre GPU (~4070 level) with 48GB or 96GB, then the community would build the software stack for you.

Granted, you would not own that software and it would be ported to other cards in the future, but if you are trying to topple the king (nvidia) it would be a powerful strat.


Look at what we (ZML) are doing: https://github.com/zml/zml

Any model, any hardware, zero compromise. TT support is on the roadmap and we talk to them.


Do you have any plan to allow conversion from Pytorch to your format?


Can you sell or re-sell colo space to a handful of customers who might put TT or other “weird” hardware there? Doesn’t scale, but it hedges your own business. Requires the right customer though, like somebody who might buy Nervana / Xeon Phi but then buy NVidia from you when it blows up.


Kind of. It is easier for us to be the capex/opex (meaning buy and run equipment) and then rent it out. I have the backing to make large investments without too much drama. We can do that with any hardware as long as we have longer (1-2 year) contracts in place. We have the space/power available in a top tier data center (Switch).

https://hotaisle.xyz/cluster/


>Given that ARM is aggressively raising license prices and royalty rates and has (metaphorically) nuked Qualcomm, RISC-V CPU IP clearly has a bright future.

It is interesting the world ( or may be RISC-V world ) seems to think any company can simply break the contract and agreement others signed, while pretending they are still righteous. At the same time thinking this wont happen to their IP.

Well I think we will all know soon.


It is informative to note the contrast between the fact that 22 companies (Microsoft, Google, Samsung, etc, etc) were all given enough notice in advance of the Nuvia acquisition, to be able to come up with a quote endorsing it. [1]

And the treatment of the 'partner' upon whose IP the deal depends:

> Neither Qualcomm nor Nuvia provided prior notice of this transaction to Arm [2]

[1] https://www.qualcomm.com/news/releases/2021/01/qualcomm-acqu...

[2] From Arm's initial court submission.


The architecture license agreements between Qualcomm, Nuvia and Arm have remained secret.

Both Qualcomm and Arm claim that they follow the requirements of the existing ALAs, while the other party has breached them.

Without access to the original texts, we cannot verify who tells the truth.

Even when a judge will give a decision for this conflict we will not be able to know who is right if the ALAs remain secret, because in recent years there have been plenty of absurd judicial decisions, where judges have been obviously wrong.

According to Qualcomm, they had no obligation to notify Arm about their business intentions. While I do not like Qualcomm for many reasons, for now there is no ground to believe more the claims of Arm than the claims of Qualcomm, so any of the two could be right about this.

Perhaps Arm is right and Qualcomm must pay the royalties specified in the Nuvia ALA instead of the royalties specified in the Qualcomm ALA, which is the main subject of this conflict, but if this is true then Nuvia has been conned by Arm into signing a really disadvantageous ALA, which is somewhat surprising, because among their business plan variants must have been one of being bought by some big company, in which case their target should have been to retain ownership of the IP developed by themselves.

There is no doubt that even if Qualcomm pays only the low royalties specified in the Qualcomm ALA, Arm will obtain a revenue orders of magnitude greater than any revenue that could have been obtained from Nuvia, had it not been bought by Qualcomm.

The reason why Arm does not like this increased revenue from the Nuvia work is that Qualcomm has decided to replace all the cores licensed from Arm in all of their products with cores developed by the Nuvia team.

Thus the loss of the royalties for the Arm-designed cores will be even greater than the increased revenue from the ALA royalties.

So Arm attempts to use or abuse whatever was written in their ALAs in order to prevent competition in the cores implementing the Arm architecture.

Even if Arm had been cunning enough to have included paragraphs in their ALAs that justify their current requests, there is no doubt that the real villain in this conflict is Arm, who fights for forcing the use of CPU cores that are weaker and more expensive than what is possible.


> Nuvia has been conned by Arm

> So Arm attempts to use or abuse whatever was written in their ALAs

> Even if Arm had been cunning enough to have included paragraphs in their ALAs

Clauses requiring consent for transfer of IP rights (or on change of control) are standard everywhere and Nuvia would have been fully aware of this at the time of signing the ALA.


I think you got it the wrong way around. Nuvia got allegedly got a really good ALA because they targeted server and where small, but then Qualcomm bought Nuvia and wants to produce smartphone chips with the cheaper Nuvia ALA, based on Nuvia IP. Arm argues they have to use the more expensive Qualcomm ALA.


No, the parent comment is correct (on the relative fees at least). Qualcomm wants to sell Nuvia-derived cores under its own ALA at a lower fee than Nuvia would have had to pay (which makes sense as server cores cost more than smartphone cores).


>where judges have been obviously wrong.

True. But if Qualcomm think they are not playing any tricks, playing by the rules clearly stated in the contract. You can bet they will appeal and fight on even if they lose.

A lot of juicy details will come out during the trial. Just like the Apple vs Qualcomm case ( Where Apple was clearly in the wrong ). We will all know soon and make up our own judgement.


I read that part differently. I didn’t see it as a statement about the legal merits of ARM’s choices so much as the pragmatic business outcomes. It is possible for an action to be 100% legally correct and a bad business move (see: creative IP companies that sue their biggest fans).

Even granting that ARM is completely within their right, this is a pretty bug reminder of the dangers of building on sole-source licensed IP when there’s an open alternative. I have little doubt that ARM drove some number of startups into choosing RISC-V.

Which may or may not matter in the long run. But it is rolling the dice and by definition a short term maximization strategy.


My comment wasn’t really around legal or business merits but rather the bad faith nature of these actions which would apply whether using open or closed source ip.

However, the ‘dangers of building on sole-source’ licensed IP is true in general but a bit overdone in this case. Qualcomm had a large number of options using Arm IP including developing their own cores in house. However they chose to do something that was - on the face of it - in breach of license agreements that Nuvia had signed.


> including developing their own cores in house.

I believe ARM just revoked Qualcomm’s architecture license, so they no longer have that option if they want to run aarch64.

The architecture license predated the Nuvia acquisition, and was the license under which Qualcomm developed their most recent laptop core.


Because the Qualcomm and Nuvia ALAs have remained secret, we cannot know whether Arm really has the right to revoke them at will.

It is hard to believe that anyone would accept to sign an ALA that can easily be revoked, because whoever signs an ALA intends to invest a lot of money developing products based on it and there is no doubt that all the invested money can be lost if the ALA is revoked unilaterally.

It remains to be seen what will happen at the trial.


It is interesting the world seems to think any company can simply break the contract and agreement others signed, while pretending they are still righteous.

I love this comment because it could apply to both sides of the case. However the dispute is a little more complex than most commentary implies.


There seems to be a growing awareness in the RISC-V world that they have made the mistake of being generals refighting their previous war, and things have moved on without them noticing, but unfortunately they painted themselves into a corner of inappropriate technical decisions, and this is manifesting in comments like those quoted.


Since you haven’t said anything specific, it’s hard to interact with the comment without guessing at what you mean. But I’ll give it a go:

> generals refighting their previous war

Here I think you are talking about the risc vs cisc debate. It is true that in modern OOO cores this is basically meaningless. I think the only truly relevant piece is fixed encoding widths, which make wider decoders feasible on a lower power budget.

On the other hand I think risc-v owes a lot of its popularity to how easy it is to understand and implement a bare bones core. Marketing will be a huge challenge for any newcomer, but risc-v is doing very well on that score.

The place I think risc-v has made really good innovations in ISA design is in the vector extension. It seems to me that it allows for code to be written for larger vectors and the machine can apply whatever vector width it has available. I believe this should allow new cores to improve the performance of old code in a way that more explicit designs like AVX have struggled with.


AIUI the jury is still out as to whether wider decoders are feasible with RISC-V's 16-bit and 32-bit split. It's specifically designed to ensure that insn lengths and positions can be decoded as easily as possible, without introducing overly inflexible notions such as VLIW-like "bundles".


Modern x86 cores can decode 9 variable length instructions per clock. It’s not easy but can be done.

RISK-V C instructions are on the other hand are trivial. All compressed instructions are 1-1 translatable to uncompressed instructions. People have measured the cost of adding C extension decoding as being a few hundred GATES.


The only 9-per cycle I could find was skymont(e-core), which is really 3x3-wide clusters. Lion cove (p-core) claims 8-wide (unclear if sequential or multiple clusters), and zen 5 is 2x4-wide. In contrast M1 had a true 8-wide decoder and M4 has a 10-wide decoder.

My understanding is that on x86 the complexity increases super linearly with decode width because of variable length instructions, while on fixed width architectures (or even a couple of sizes like arm thumb or risc-v compressed) it’s basically linear complexity.

This chips and cheese article seems to back this up. https://open.substack.com/pub/chipsandcheese/p/zen-5s-2-ahea...


The main thing that makes decoding Intel instructions hard is you don’t know where a valid instruction is starting, risc-v c instructions don’t cause this problem, because you can always read at a 4-byte boundary, you just might get two instructions in 4 bytes instead of just one. There is certainly complexity added to a decoder by this, but it’s trivial compared to what x86 has to deal with.


What do you mean by "inappropriate technical decisions"?


That is the wrong question.

The right question is how have things have moved on, then you can reevaluate the technical choices and understand why they are inappropriate.


The right question is 'what are you even talking about?'.


A serious answer as to why I didn't answer the question directly?

The RISC-V community is dominated by a culture that is fighting the semiconductor war of 20 years ago. Consequently their core ideas are those that were in play then, and they remain steadfastly attached to them. In this case the sea of baby cores is actually an ideal application of RISC-V.

You cannot point out the core problem for a multitude of reasons, one of which is that if you did it would just turn to "prove it", and honestly another is that even though many of us would like an alternative ISA to succeed we do kind of enjoy watching people like SiFive spin round in circles. A strongly related problem is the likes of SiFive are promoting premature ossification before the thing has evolved to practical utility.


I still don't really know what you are talking about. The only thing I am getting here is that standard ratification happened to fast? What standard extensions exactly were ratified to early?

So please, actually say what you mean and explain what mistakes were made, how things would be different if not for those mistakes.

> SiFive

Semiconductor is an old industry and getting started in it was always hard. That not ever company that does something with RISC-V is 100% successful all the time isn't really surprising or relevant. And SiFive by itself also doesn't have the power to push things threw.


> What standard extensions exactly were ratified to early?

This is part of why RISC-V has a problem: your question, as with the others, is wrong.

Why do we assume the whole core philosophy is actually correct? Why do we think that we can solve mistakes or omissions with extensions?

The underlying reality is you can’t. As the debates over some extensions are showing not all ISAs are equal (a myth promoted by Intel at length) and once you accept that a lot of the rest falls apart.

This is especially true if there is a core insight which your main competition are not telling you regarding the rest.

SiFive does not need to push things through, they damage things by promoting inertia. They are so effective at it it would be tempting to say it is a conspiracy.


I still don't understand what you are talking about.

> The underlying reality is you can’t.

But the actual real reality is you can ... because it was done and it is working pretty well.

> As the debates over some extensions are showing not all ISAs are equal

I don't know what that means. And even if I knew, 'equal' is irrelevant. The B extension and the V extension don't have to be 'equal' in any sense of the word I can think of.

> This is especially true if there is a core insight which your main competition are not telling you regarding the rest.

Again, what are you even talking about. You are never actual concrete. Do you believe there is a magical 'core' insight that ARM and Intel have? And if so why do you think non of the countless people involved in RISC-V that have worked all all these companies for the last 30 years don't know about it?

> SiFive does not need to push things through, they damage things by promoting inertia. They are so effective at it it would be tempting to say it is a conspiracy.

SiFive isn't in control. RISC-V International is gigantic with lots of members that have input. What you call 'inertia' I would call 'lots of hard work by lots of people to get the best outcome' and most people are pretty happy with the results and the progress.

The only possible thing I can think of that you might reference is Qualcomm attempted to take over, something that was not opposed by SiFive, but by literally everybody else. If you think they were right you can just say so.

And if it turns out Qualcomm is so uniquely brilliant and insightful they can publish their own extensions and because its so superior, people will adopt it. The same goes for anybody else.


> I still don't understand what you are talking about.

I see that. Sorry for wasting bandwidth hoping otherwise.


I don't understand either.

Seriously dude, is it so difficult to explain what you actually mean instead of talking around in circles?


Are you lot really this oblivious? Seriously?


Hey at least you were able to perform your dada in public.


And you get to insult people trying to help you. Well done, it won't happen again.


You know when multiple people are trying to tell you something, you could consider listening.


I agree that the RISC-V ecosystem has some issues that I hope will be sorted out. But not all RISC-V cores are "baby cores". I imagine that XiangShan [1], albeit student work, will spur work towards bigger, more performant OOO cores.

[1] https://github.com/OpenXiangShan/XiangShan


> Tenstorrent has an implicate view that the future of AI is mixed-workload, not pure linear algebra spam. Yes, MATMUL go BRRRRRR is valuable, but CPU workloads will be needed in the future. That is the hope. So far, this has not played out.

At least not in the training end of machine learning, which is the big money sink.

What striking about this whole business is that it's still mostly hammering on a decades-old idea with more and more hardware and data. That worked great for a few years, but now seems to be reaching its limits. What's next?


It's not really "AI hardware", it's just HPC on a slightly smaller scale than the traditional sort. And it's still going to be useful for plenty of workloads regardless of the current AI frenzy.


Like what workloads? Genuinely curious what you'd run on it.


> I genuinely believe Groq is a fraud. There is no way their private inference cloud has positive gross margins.

> Llama 3.1 405B can currently replace junior engineers

I'd like more exposition on these claims.


Not Llama but with Sonnet and O1 I wrote a bespoke android app for my company in about 8 hours of work. Once I polish it a bit (make a prettier UI), I'm pretty sure I could sell it to other companies doing our kind of work.

I am not a programmer, and I know C and Python at about a 1 day crash course level (not much at all).

However with sonnent I was able to be handheld all the way from downloading android studio to a functional app written in kotlin, that is now being used by employees on the floor.

People can keep telling themselves that LLMs are useless or maybe just helpful for quickly spewing boilerplate code, but I would heed the warning that this tech is only going to improve and already helping people forgo SWE's very seriously. Sears thought the internet was a cute party trick, and that obviously print catalogs were there to stay.


This is meaningless without talking about the capabilities of the app. I’ve seen examples of this before where non-programmers come up with something using an LLM that could just be a webpage with camera access and some javascript


Today, I wrote a full YouTube subtitle downloader in Dart. 52 minutes from starting to google anything about it, to full implementation and tests, custom formatting any of the 5 obscure formats it could be in to my exact whims. Full coverage of any validation errors via mock network responses.

I then wrote a web AudioWorklet for playing PCM in 3 minutes, which complied to the same interface as my Mac/iOS/Android versions, ex. Setting sample rate, feedback callback, etc. I have no idea what an AudioWorklet is.

Two days ago, I stubbed out my implementation of OpenAI's web socket based realtime API, 1400 LOC over 2 days, mostly by hand while grokking and testing the API. In 32 minutes, I had a brand spanking new batch of code, clean, event-based architecture, 86% test coverage. 1.8 KLOC with tests.

In all of these cases, most I needed to do was drop in code files and say, nope wrong a couple times to sonnet, and say "why are you violating my service contract and only providing an example solution" to o1.

Not llama 3.1 405B specifically, I haven't gone to the trouble of running it, but things turned some sort of significant corner over the last 3 months, between o1 and Sonnet 3.5. Mistakes are rare. Believable 405B is on that scale, IIRC it went punch for punch with the original 3.5 Sonnet.

But I find it hard to believe a Google L3, and third of L4s, (read: new hires, or survived 3 years) are that productive and sending code out for review at a 1/5th of that volume, much less on demand.

So insane-sounding? Yes.

Out there? Probably, I work for myself now. I don't have to have a complex negotiation with my boss on what I can use and how. And I only saw this starting ~2 weeks ago, with full o1 release.

Wrong? Shill? Dilletante?

No.

I'm still digesting it myself. But it's real.


Most software is not one off little utilities/scripts, greenfield small projects, etc... That's where LLMs excel, when you don't have much context and can regurgitate solutions.

It's less to do with junior/senior/etc.. and more to do with the types of problems you are tackling.


This is a 30KLOC 6 platform flutter app that's, in this user story, doing VOIP audio, running 3 audio models on-device, including in your browser. A near-replica of the Google Assistant audio pipeline, except all on-device.

It's a real system, not kindergarten "look at the React Claude Artifacts did, the button makes a POST request!"

The 1500 loc websocket / session management code it refactored and tested touches on nearly every part of the system (i.e. persisting messages, placing search requests, placing network requests to run a chat flow)

Also, it's worth saying this bit a bit louder: the "just throwing files in" I mention is key.

With that, the quality you observed being in reverse is the distinction: with o1 thinking, and whatever Sonnet's magic is, there's a higher payoff from working a larger codebase.

For example, here, it knew exactly what to do for the web because it already saw the patterns iOS/Android/macOS shared.

The bend I saw in the curve came from being ultra lazy one night and seeing what would happen if it just had all the darn files.


> This is a 30KLOC 6 platform flutter app […] It's a real system, not kindergarten "look at the React Claude Artifacts did, the button makes a POST request!"

This is powerful and significant but I think we need to ground ourselves on what a skilled programmer means when he talks about solving problems.

That is, honestly ask: What is the level of skill a programmer requires to build what you've described? Mostly, imo, the difficulties in building it are in platform and API details not in any fundamental engineering problem that has to be solved. What makes web and Android programming so annoying is all the abstractions and frameworks and cruft that you end up having to navigate. Once you've navigated it, you haven't really solved anything, you've just dealt with obstacles other programmers have put in your way. The solutions are mostly boilerplate-like and the code I write is glue.

I think the definition of "junior engineer" or "simple app" will be defined by what LLMs can produce and so, in a way, unfortunately, the goal posts and skill ceiling will keep shifting.

On the other, hand, say we watch a presentation by the Lead Programmer at Naughty Dog, "Parallelizing the Naughty Dog Engine Using Fibers"[^0] and ask the same questions: what level of skill is required to solve the problems he's describing (solutions worth millions of dollars because his product has to sell that much to have good ROI):

"I have a million LOC game engine for which I need to make a scheduler with no memory management for multithreaded job synchronization for the PS4."

A lot of these guys, if you've talked to them, are often frustrated that LLMs simply can't help them make headway with, or debug, these hard problems where novel hardware-constrained solutions are needed.

---

[^0]: https://www.youtube.com/watch?v=HIVBhKj7gQU


It's been pretty hard, but if you reduce it to "Were you using a framework, or writing one that needs to push the absolute limits of performance?"...

...I guess the first?...

...But not really?

I'm not writing GPU kernels or operating system task schedulers, but I am going to some pretty significant lengths to be running ex. local LLM, embedding model, Whisper, model for voice activity detection, model for speaker counting, syncing state with 3 web sockets. Simultaneously. In this case, Android and iOS are no valhalla of vapid-stackoverflow-copy-pasta-with-no-hardware-constraints, as you might imagine.

And the novelty is, 6 years ago, I would have targeted iOS and prayed. Now I'm on every platform at top-tier speeds. All that boring tedious scribe-like stuff that 90% of us spend 80% of our time on, is gone.

I'm not sure there's very many people at all who get to solve novel hardware-constrained problems these days, I'm quite pleased to brush shoulders with someone who brushes shoulders with them :)

Thus, smacks more of no-true-scotsman than something I can chew on. Productivity gains are productivity gains, and these are no small productivity gain in a highly demanding situation.


> Thus, smacks more of no-true-scotsman than something I can chew on.

I wasn't making a judgement about you or your work, after all I don't know you. I was commenting within the context of an app that you described for which an LLM was useful, relative to the hard problems we'll need help with if we want to advance technology (that is, make computers do more powerful things and do them faster). I have no idea if you're a true Scotsman or not.

Regardless: over the coming years we'll find out who the true Scotsmen were, as they'll be hired to do the stuff LLMs can't.


The challenging projects I've worked on are challenging not because slamming out code to meet requirements is hard (or takes long).

It's challenging because working to get a stable set of requirements requires a lot of communication with end users, stakeholders, etc. Then, predicting what they actually mean when implementing said requirements. Then, demoing the software and translating their non-technical understanding and comments back into new requirements (rinse and repeat).

If a tool can help program some of those requirements faster, as long as it meets security and functional standards, and is maintainable, it's not a big deal whether a junior dev is working with Stack Exchange or Claude, IMO. But I do want that dev to understand the code being committed, because otherwise security bugs and future maintenance headaches creep in.


I've definitely noticed the opposite on larger codebases. It's able to do magical things on smaller ones but really starts to fall apart as I scale up.


Most software is simple HTML.


This isn't where the leverage is though


No, but AI will replace a lot of web developers.


No true Botsman would replace a junior eng!


I think most software outside of the few Silicon Valleys of the world is in fact a bunch of dirty hacks put together.

I fully believe recursive application of current SOTA LLMs plus some deployment framework can replace most software engineers who work in the cornfields.


I don't understand what you guys are doing. For me sonnet is great when I'm starting with a framework or project but as soon as I start doing complicated things it's just wrong all the time. Subtly wrong, which is much worse because it looks correct, but wrong.


YouTube dlp has a subtitle option. To quote the documentation: "--write-sub Write subtitle file --write-auto-sub Write automatically generated subtitle file (YouTube only) --all-subs Download all the available subtitles of the video --list-subs List all available subtitles for the video --sub-format FORMAT Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best" --sub-lang LANGS Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags"


This is a key point and one of the reasons why I think LLMs will fall short of expectation. Take the saying "Code is a liability," and the fact that with LLMs, you are able to create so much more code than you normally would:

The logical conclusion is that projects will balloon with code pushing LLMs to their limit, and this massive amount is going to contain more bugs and be more costly to maintain.

Anecdotally, supposedly most devs are using some form of AI for writing code, and the software I use isn't magically getting better (I'm not seeing an increased rate of features or less buggy software).


My biggest challenges in building a new application and maintaining an existing one is lack of good unit tests, functional tests, and questionable code coverage; lack of documentation; excessively byzantine build and test environments.

Cranking out yet more code, though, is not difficult (and junior programmers are cheap). LLMs do truly produce code like a (very bad) junior programmer: when trying to make unit tests, it takes the easiest path and makes something that passes but won't catch serious regressions. Sometimes I've simply reprompted it with "Please write that code in a more proper, Pythonic way". When it comes to financial calculations around dates, date intervals, rounding, and so on, it often gets things just ever-so-slightly wrong, which makes it basically useless for financial or payroll type of applications.

It also doesn't help much with the main bulk of my (paid) work these days, which is migrating apps from some old platform like vintage C#-x86 or some vendor thing like a big pile of Google AppScript, Jotform, Xapier, and so on into a more maintainable and testable framework that doesn't depend on subscription cloud services. So far I can't find a way to make LLMs productive at all at that - perhaps that's a good thing, since clients still see fit to pay decently for this work.


I don't understand - why does the existence of a CLI tool mean we're risking a grey goo situation if an LLM helps produce Dart code for my production Flutter app?

My guess is you're thinking I'm writing duplicative code for the hell of it, instead of just using the CLI tool - no. I can't run arbitrary binaries, at all, on at least 4 of the 6 platforms.

Beyond that, that's why we test.


Apologies if it looked like I was singling out your comment. It was more that those comments brought the idea to mind that sheer code generation without skilled thought directing it may lead to unintended negative outcomes.


Something I noticed is that as the threshold for doing something like "write software to do X" decreases, the tendency for people to go search for an existing product and download it tends to zero.

There is a point where in some sense it is less effort to just write the thing yourself. This is the argument against micro-libraries as seen in NPM as well, but my point is that the threshold of complexity for "write it yourself" instead of reaching for a premade thing changes over time.

As languages, compilers, tab-complete, refactoring, and AI assistance get better and better, eventually we'll reach a point where the human race as a whole will be spitting out code at an unimaginable rate.


I agree with you that they are improving not being a programer I can't tell if the code has improved but as user that uses chat gpt or Google gemini to build scripts or trading view indicators. I am seeing some big improvements and many times wording it better detail and restricting it from going of tangent results in working code.


Show the code


That’s why I’m annoyed: they never show the code.


I learned over 20 years ago to _never_ post your code online for programmers to critique. Never. Unless you are an absolute pro (at which point you wouldn't even be asking for review), never do it.


You gotta lower your expectations. :) I didn't see the comment till now, OP made it at 3:30 AM my time.

Here you go - https://pastebin.com/8zdMDEnG


I think they meant the code that the LLM generated.


Right: that is it.


This has clearly been heavily edited by humans.


I'm happy to provide whatever you ask for. With utmost deference, I'm sure you didn't mean anything by it and were just rushed, but just in case...I'd just ask that you'd engage with charity[^3] and clarity[^2] :)

I'd also like to point out[^1] --- meaning, I gave it my original code, in toto. So of course you'll see ex. comments. Not sure what else contributed to your analysis, that's where some clarity could help me, help you.

[^1](https://news.ycombinator.com/item?id=42421900) "I stubbed out my implementation of OpenAI's web socket based realtime API, 1400 LOC over 2 days, mostly by hand while grokking and testing the API. In 32 minutes, I had a brand spanking new batch of code, clean, event-based architecture, 86% test coverage. 1.8 KLOC with tests."

[^2](https://news.ycombinator.com/newsguidelines.html) "Be kind...Converse curiously; don't cross-examine."

[^3](https://news.ycombinator.com/newsguidelines.html) "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."


Can you briefly describe your work flow? Are you exchanging information with Sonnet in your IDE?


> groq

i went to a groq event and one of their engineers told me they were running 7 racks!! of compute per (70b?) model. that was last year so my memory could be fuzzy.

iirc, groq used to be making resnet-500? chips? the only way such an impressive setup makes any kind of sense (my guess) would be they bought a bunch of resnet chips way back when and now they are trying to square peg in round hole that sunk cost as part of a fake it till you make it phase. they certainly have enough funding to scrap it all and do better... the question is if they will (and why they haven't been able to yet)


Yes, Groq requires hundreds or thousands of chips to load an LLM because they didn't predict that LLMs would get as big as they are. The second generation chip can't come soon enough for them.


also, he believes cerebras is shit but also cerebras runs llama the most efficiently and at top speed. biased ^10


So I interacted with people at cerebras at a tradeshow and it seems like you have to have extremely advanced cooling to keep that thing working. IIRC the user agreement says "you can't turn it off or else the warranty is void". With the way their chip is designed, I would be strongly worried that the giant chip has warping issues, for example, when certain cores are dark and the thermal generation is uneven (or, if it gets shut down on accident while in the middle of inferencing an LLM). There may even be chip-to-chip variation depending on which cores got dq'd based on their on-the-spot testing.

Already through the gapevine I'm hearing that H100s and B100s have to be replaced more often.... than you'd want? I suspect people are mum about it otherwise they might lose sweetheart discounts from nvidia. I can't imagine that cerebras, even with their extreme engineering of their cooling system, have truly solved cooling in a way that isn't a pain in the ass (otherwise they wouldn't have the clause?) and if I were building a datacenter I would be very worried about having to do annoying and capital intensive replacements.


I have nowhere near the knowledge required to say yes or no to your argument. My point is that the guy that wrote the article is shilling a pre-ipo company whole fuding the competitors which is really surprising to get that many upvotes.


maybe but it shouldn't be surprising. cerebras's designs were born ~2014 ~pre transformers, and the megachips were initially targetted for hpc workloads. it was definitely "solution looking for a problem" back then and now is drifting into square peg in round hole territory now (see sibling comment about groq). I'm surprised they have gotten their raw perf as high as they have by now.


Ilya was very much in awe and totally contrary to what you are saying btw if you read the Elon ope ai emails. Also, they do run llama the fastest.


in the real world fastest isnt everything. power and capital cost per token is


Kudos to the Tenstorrent team for being so open to having a discussion. I wish more companies would be like that, as constructive criticism is very useful, especially for a startup


One concern Ive heard is that Llama can be a great accelerator for senior engineers who know what they want. However for junior engineers it could hold back their learning as they just use whatever it gives without really understanding what or why something was done that way. It seems plausible to me. At least using stack overflow people get explanations of why you do x to get y, it's a bit copy paste, sure, but the pace brought learning. Now if you type a comment and get code, a junior engineer may not even read what's generated as 'theu don't need to, it usually works' .... Any ideas how we can ensure junior engineers do effectively learn and understand what is going on, while still getting the LLM benefit?


I’ve been coding for a long time and just became very proficient in dart this year just by using LLM and asking it to explain anything I don’t know. I haven’t been on Stack Overflow in 2 years and it’s not because I’m taking shortcuts by letting the LLM write my code, it’s because it is the best teacher / paired programmer that will just sit with you with endless patience. Especially using @docs with cursor and @directory context.


I am in a similar situation where I needed to jump in on a python Django codebase for a side gig I'm helping a friend out on. Since I have no interest in using Django in the future I decided to Claude my way trough the project - I did use Django like 10 years ago, am fairly competent with python and used rails recently - so I thought how bad can this be. By the time I had something cobbled together with Claude I decided to get a friend who's competent with Django to review my code and boy did I feel like an amateur. Not only did the code not use good practices/patterns (eg. not even using viewsets from DRF), I couldn't even follow in the conversation because I knew nothing about these concepts. So I spent a day reading docs and looking at a standard example of a idiomatic setup and it made my project a lot better. I've had this experience at multiple points in the project - not reading the API docs and letting Claude walk me through integration left me stuck when Claude failed, design decisions.

So I would say Claude is useful for simple execution when you know what you are expecting, relying on it to learn sounds like a short term gain for problems down the line. At a point where LLMs can reliably lookup sources and reason trough something to explain it there will be no coding left, but I feel we aren't close to that with current tech.


> I’ve been coding for a long time...

I think having been coding for a long time, I don't think you fall into the same category. Dart having paradigms not too different from other standard languages, a lot of these skills are probably transferable.

I've seen beginners on the other hand using LLMs who couldn't even write a proper for-loop without AI assistance. They lack the fundamental ability to "run code in their head." This type of person I feel would be utterly limited by the capabilities of the AI model and would fail in lockstep with it.


This is kind of the classic “kids these days” argument: that because we understand something from what we consider the foundational level up to what we consider the goal level, anyone who comes along later and starts at higher levels will be limited / less capable / etc.

It is true, but also irrelevant. Just like most programmers today do not need to understand CPU architecture or even assembly language, programmers in the future will not need to understand for loops the way we do.

They will get good at writing LLM-optimized specifications that produce the desired results. And that will be fine, even if we old-timers bemoan that they don’t really program the way we do.

Yes, the abstractions required will be inefficient. And we will always be able to say that when those kids solve our kinds of problems, our methods are better. Just like assembly programmers can look at simple python programs and be astounded at the complexity “required” to solve simple problems.


I agree with this take actually. I do imagine how programming in the future could be comprised of mostly interactions with LLMs. Such interactions would probably constrained enough to get the success rate of LLMs sufficiently high, maybe involving specialized DSLs made for LLMs.

I do think the future may be more varied. Just like today where I look at kernel/systems/DB engineering and it seems almost arcane to me, I feel like there will be another stratum created, working on things where LLMs don't suffice.

A lot of this will also depend on how far LLMs get. I would think that there would have to be more ChatGPT-like breakthroughs before this new type of developer can come.


I feel like you underestimate how much effort goes into making CPUs reliable and how low level/well specified the problem of building a VM/compiler is compared to getting a natural language specification to executable program. Solving that reliably is basically AGI - I doubt there will be many humans in the loop if we reach that point.


I get CPU’s; I worked at Intel and cut my teeth on x86 assembly.

But the fact that some people need to understand CPU architecture does not mean all people need to.

The vast, vast majority of programmers do not need to understand CPUs or compilers today. That’s fine. It is also fine that many new programmers won’t even think in the form of functions and return values and types the way we do.

I’m not saying traditional hard science tech is useless. I am saying it is not mandatory for everyone.


Yeah but what I'm saying is the level of engineering power that goes into making such relatively simple abstraction is huge and we have decades of experience.

I think if AI ever gets to the point where it's so reliable for natural language -> code - we're into the AGI era and I don't see the role of programmers at all - bridging that layer successfully requires some very careful analysis and context awareness.

Unless you think we're headed off in a direction where LLLms are gluing idiot proof boxes that are super inefficient but get the job done. I can sort of see that happening but in my experience reasoning through/debugging natural language specs is harder than going through equivalent code - I don't think we're getting much value here and adding a huge layer of inefficiency.


Probably by not letting them check in code they don't understand. This creates a lot of overhead, but it turns out you simply can't skip that time investment. Somewhere along the junior->senior developer (in terms of skill, not time) is a big time cost of just sitting down and learning. You can pretend to skip it with LLMs, but you're just delaying having to understand it.

So the benefit of LLMs is negligible in the long term, for beginner/junior programmers, as they essentially collect knowledge debt. Don't let your juniors use LLMs, or if you do, make sure they can explain every little detail about the code they have written. You don't have to be annoying about it - ask socratically.


I Agree. However in many cases the cat is out of the bag in terms of juniors using LLMs, it's almost ubiquitous these days .


Despite your observation that SO has explanations, SO can he and has also been used as a zero-learning crutch.

Similarly, LLMs can be used as educational tools.

In the end, learning and self-improvement needs some non-trivial motivation from the individual.

The answer to your question is to show them it’s valuable to learn. If they find they’re completing their tasks adequately from AI assistance, then give them harder tasks. Meanwhile, model how your own learning effort is paying off.

Note how if they are left with AI-trivial tasks and the benefit of learning remains abstract, we shouldn’t expect anything to change.


I agree, using LLMs the right way and SO the wrong way is definitely possible, but it's easier to use LLMs the wrong way and just blindly without understanding. A diligent person would use it as a learning tool for sure.


I almost forgot about that ARM-Qualcomm dispute, the fireworks are only a few days away.


It would be pretty awesome if AMD sold aggressively into the data center market, or if NVIDIA sold aggressively into the supercompute market.

All discussion on this is so much sports team fandom until Jensen and Lisa stop pillow fighting on the biggest pile of money since that 50 Cent cover and the highest margins since the Dutch East India Company.

I just spent a few days getting FlashAttention and TransformerEngine built with bleeding edge drivers and all the wheels to zero typing install on three generations of NVIDIA and shit (PS you’re build advertises working without ninja but fucks up CMake INSTALL paths): this is the unassailable awesomeness that no one can compete with? ptxas spinning a core for longer than it takes to build Linux and sometimes crashing in the process?

No, this is central committee grift sanctioned at the highest levels to the applause of HN folks long NVDA.


And if you really think about who is getting insulted here? It's the NVIDIA and AMD driver and toolchain hackers. I read diffs from those folks: they are polymath ballers. The average HN comment including this one has more bugs in it than a patch from an NVIDIA/AMD driver author.

But no, they're supposedly just too incompetent to run PyTorch or whatever. Fuck that, the hackers kick ass.

This shit is broken because shareholders get more rent when it's broken.


Having spent a fair amount of time over the past few years as someone working on top of both Nvidia and AMD toolchains (and having spent decades working up and down the stack from mcus, mobile device, to webscale stacks), I have an alternative take.

I'm smart and experienced enough, and the people working on the toolchains are capable and even smarter, but these systems are just extremely hard/complex. The hardware is buggy, the firmware is buggy, the drivers are buggy, every layer above it (compilers, kernels, multiple layers of frameworks) also all buggy. Add on top of that everyone is implementing as fast as they can, every part is changing constantly, and also a good chunk of the code being used is being written by researchers and grad students. Oh, and the hardware changes dramatically every couple years. Sure all tech is swiss cheese, but this is a particularly unstable/treacherous version of that atm.

BTW, shareholders obviously have no idea whether what's being sold works well or not, but there's no financial incentive (in terms of selling stuff/profiting) if your product doesn't even work/work well enough to compete. And if it were so easy, you would see any number of hardware startups take over from the incumbents. Sure there are network effects, but if I could run PyTorch (or really, just train and inference my models easily) faster and cheaper, me, and I assume lots of other people would switch in a heartbeat. The fact that no one (besides Google arguably if you count having to switch all your implementations over for TPUs) has been able to do it, whether they be startups or multi-billion dollar companies (including Amazon, Meta, and Microsoft, all who have direct financial incentive to and have been trying), points to the problem being elsewhere.


I’m a bit out of my area on the details: but I have CUDA kernels in PyTorch and I’ve moved around much, much bigger codebases. I write kernels and build others.

I don’t doubt that it’s a lot of work to maintain a clean userland and present a good interface with clean link ABIs. The people who pull it off and make it look easy are called names like Linus.

But the “everything is buggy” argument is an argument about institutions, not software artifacts. We know how to throw test and fuzz on a component and bang it into shape. Now this is a question of both resources and intention.

But NVIDIA’s market cap is like 4 trillion bucks or something.

I want perfect or a firing squad at that number.


Capitalism sounds dope. I hope I live to see it.


Why are you replying to yourself?


Because unlike an edit, it's timestamped.


But your reply doesn't even seem to make sense as an edit?


> Llama 3.1 405B can currently replace junior engineers

lol


Can LLM join a standup call? Can LLM create a merge request?

At the moment it looks like an experienced engineer can pressure LLM to hallucinate a junior level code.


The argument is that, instead of hiring a junior engineer, a senior engineer can simply produce enough output to match what the junior would have produced and then some.

Of course, that means you won't be able to train them up, at least for now. That being said, even if they "only" reach the level of your average software developer, they're already going to have pretty catastrophic effects on the industry.

As for automated fixes, there are agents that _can_ do that, like Devin (https://devin.ai/), but it's still early days and bug-prone. Check back in a year or so.


Not training new workers and relying on senior engineers with tools is short sighted and foolish.

LLMs seem to be accelerating the trend


On one hand, I somewhat agree; on the other hand, I think LLMs and similar tooling will allow juniors to punch far beyond their weight and learn and do things that they would have never have dreamed of before. As mentioned in another comment, they're the teacher that never gets tired and can answer any question (with the necessary qualifications about correctness, learning the answer but not the reasoning, etc)

It remains to be seen if juniors can obtain the necessary institutional / "real work" experience from that, but given the number of self-taught programmers I know, I wouldn't rule it out.


I think many people using llms are faking it and have no interest in “making it”.

It’s not about learning for most.

Just because a small subset of intelligent and motivated people use tools to become better programmers, there is a larger number of people that will use the tools to “cheat”.


Tools are foolish? Like, should we remove all of the other tools that make senior engineers more productive, in favor of hiring more people to do those same tasks? That seems questionable.


Tools are great, but there is a way to learn the fundamentals and progress through skills and technology.

Learn to do something manually and then learn the technology.

Do you want engineers who are useless if their calculator breaks or do you want someone who can fall back on pen and paper and get the work done?


Well what if their pen breaks? Perhaps a good fluid dynamics engineer needs to be able to create ink from common plants?

I get the argument, it’s just silly. Calculators don’t “break”. I would rather have an engineer who uses highly reliable tools than one who is so obsessed with the lowest levels of the stack that they aren’t as strong at the top.

I’m willing to live with a useless day in the insanely unlikely event that all readily available calculators stop working.


There's an incentive problem because the benefit from training new workers is distributed across all companies whereas the cost of training them is allocated to the single company that does so


Most broken systems have bad incentives.

Companies don’t want to train people ($) because employees with skills and experience are more valuable to other companies because retention is also expensive.

We are not training AND retaining talent.


> The argument is that, instead of hiring a junior engineer, a senior engineer can simply produce enough output to match what the junior would have produced and then some.

...and that's just as asinine of a claim as the original one


Why? I can say that, in my personal experience, AI has allowed me to work more efficiently as a senior engineer: I can describe the behaviour I want, scan over the generated code and make any necessary fixes much faster than either writing the code myself or having a junior do it.


Plain grift, or are they high on their own supply?


Both? Both is good.


beware of the "expert" rating other companies




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: