More

mmaunder · 2025-12-17T18:39:17 1765996757

Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.

Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.

Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.

Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.

Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.

mips_avatar · 2025-12-17T18:48:48 1765997328

I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.

visarga · 2025-12-17T20:39:52 1766003992

I do. Gave up using Gemini directly.

mmaunder · 2025-12-17T19:03:25 1765998205

Agreed. It's ridiculous.

mmaunder · 2025-12-17T18:33:49 1765996429

I think about what would be most terrifying to Anthropic and OpenAI i.e. The absolute scariest thing that Google could do. I think this is it: Release low latency, low priced models with high cognitive performance and big context window, especially in the coding space because that is direct, immediate, very high ROI for the customer.

Now, imagine for a moment they had also vertically integrated the hardware to do this.

JumpCrisscross · 2025-12-17T18:52:10 1765997530

> think about what would be most terrifying to Anthropic and OpenAI

The most terrifying thing would be Google expanding its free tiers.

avazhi · 2025-12-17T18:38:53 1765996733

"Now, imagine for a moment they had also vertically integrated the hardware to do this."

Then you realise you aren't imagining it.

iwontberude · 2025-12-17T18:54:53 1765997693

“And then imagine Google designing silicon that doesn’t trail the industry. While you are there we may as well start to imagine Google figures out how to support a product lifecycle that isn’t AdSense”

Google is great on the data science alone, every thing else is an after thought

avazhi · 2025-12-17T18:58:10 1765997890

https://blog.google/products/google-cloud/ironwood-google-tp...

"And then imagine Google designing silicon that doesn’t trail the industry."

I'm def not a Google stan generally, but uh, have you even been paying attention?

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

mmaunder · 2025-12-17T19:13:30 1765998810

It's not funny when I have to explain the joke.

avazhi · 2025-12-17T19:46:06 1766000766

Oh I got your joke, sir - but as you can see from the other comment, there are techies who still don't have even a rudimentary understanding of tensor cores, let alone the wider public and many investors. Over the next year or two the gap between Google and everybody else, even those they license their hardware to, is going to explode.

iwontberude · 2025-12-17T19:37:19 1766000239

Exactly my point, they have bespoke offerings but when they compete head to head for performance they get smoked. See more: their Tensor processor that they use in the beleaguered Pixel. They are in last place.

mmaunder · 2025-12-17T18:21:31 1765995691

Thanks, having it walk a hardcore SDR signal chain right now --- oh damn it just finished. The blog post makes it clear this isn't just some 'lite' model - you get low latency and cognitive performance. really appreciate you amplifying that.

mmaunder · 2025-12-17T03:57:28 1765943848

"...trust from other large, imporant [sic] third parties which in turn has given Waterfox users access to protected streaming services via Widevine."

The black box objection disqualifies Widevine.

mmaunder · 2025-12-13T19:31:24 1765654284

I'm finding I can put my ops background fully to work, designing far more complex and performant architectures that require big lifts, without worrying about how much my fingers are going to hurt and that it'll take 8 months to even prove if it works.

mmaunder · 2025-12-13T18:09:59 1765649399

Solid approach. Don’t be shy about writing long prompts. We call that context engineering. The more you populate that context window with applicable knowledge and what exactly you want, the better the results. Also, having the model code and you talk to the model is helpful because it has the side effect of context engineering. In other words you’re building up relevant context with that conversation history. And be acutely aware of how much context window you’ve used and how much is remaining and when a compaction will happen. Clear context as early as you can per run. Even if it’s 90% remaining.

mmaunder · 2025-12-11T22:03:20 1765490600

Yeah I think a lot of us are taking knowing how LLMs work for granted. I did the fast.ai course a while back and then went off and played with VLLM and various LLMs optimizing execution, tweaking params etc. Then moved on and started being a user. But knowing how they work has been a game changer for my team and I. And context window is so obvious, but if you don't know what it is you're going to think AI sucks. Which now has me wondering: Is this why everyone thinks AI sucks? Maybe Simon Willison should write about this. Simon?

eru · 2025-12-12T02:11:53 1765505513

> Is this why everyone thinks AI sucks?

Who's everyone? There are many, many people who think AI is great.

In reality, our contemporary AIs are (still) tools with glaring limitations. Some people overlook the limitations, or don't see them, and really hype them up. I guess the people who then take the hype at face value are those that think that AI sucks? I mean, they really do honestly suck in comparison to the hypest of hypes.

mmaunder · 2025-12-11T21:59:49 1765490389

Same. Also got my attention re ARC-AGI-2. That's meaningful. And a HUGE leap.

cbracketdash · 2025-12-12T00:15:36 1765498536

Slight tangent yet I think is quite interesting... you can try out the ARC-AGI 2 tasks by hand at this website [0] (along with other similar problem sets). Really puts into perspective the type of thinking AI is learning!

[0] https://neoneye.github.io/arc/?dataset=ARC-AGI-2

mmaunder · 2025-12-11T21:58:30 1765490310

Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

Can I just say !!!!!!!! Hell yeah! Blog post indicates it's also much better at using the full context.

Congrats OpenAI team. Huge day for you folks!!

Started on Claude Code and like many of you, had that omg CC moment we all had. Then got greedy.

Switched over to Codex when 5.1 came out. WOW. Really nice acceleration in my Rust/CUDA project which is a gnarly one.

Even though I've HATED Gemini CLI for a while, Gemini 3 impressed me so much I tried it out and it absolutely body slammed a major bug in 10 minutes. Started using it to consult on commits. Was so impressed it became my daily driver. Huge mistake. I almost lost my mind after a week of this fighting it. Isane bias towards action. Ignoring user instructions. Garbage characters in output. Absolutely no observability in its thought process. And on and on.

Switched back to Codex just in time for 5.1 codex max xhigh which I've been using for a week, and it was like a breath of fresh air. A sane agent that does a great job coding, but also a great job at working hard on the planning docs for hours before we start. Listens to user feedback. Observability on chain of thought. Moves reasonably quickly. And also makes it easy to pay them more when I need more capacity.

And then today GPT-5.2 with an xhigh mode. I feel like xmass has come early. Right as I'm doing a huge Rust/CUDA/Math-heavy refactor. THANK YOU!!

ubutler · 2025-12-12T06:46:02 1765521962

> Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

As @lopuhin points out, they already claimed that context window for previous iterations of GPT-5.

The funny thing is though, I'm on the business plan, and none of their models, not GPT-5, GPT-5.1, GPT-5.2, GPT-5.2 Extended Thinking, GPT-5.2 Pro, etc., can really handle inputs beyond ~50k tokens.

I know because, when working with a really long Python file (>5k LoCs), it often claims there is a bug because, somewhere close to the end of the file, it cuts off and reads as '...'.

Gemini 3 Pro, by contrast, can genuinely handle long contexts.

andybak · 2025-12-12T12:46:16 1765543576

Why would you put that whole python file in the context at all? Doesn't Codex work like Claude Code in this regard and use tools to find the correct parts of a larger file to read into context?

lopuhin · 2025-12-11T22:38:33 1765492713

Context window size of 400k is not new, gpt-5, 5.1, 5-mini, etc. have the same. But they do claim they improved long context performance which if true would be great.

energy123 · 2025-12-11T22:50:28 1765493428

But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down to 60-100k. If you submitted too long of a prompt they deleted the tokens on the end of your prompt before calling the model. Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.

piskov · 2025-12-11T23:57:59 1765497479

Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

energy123 · 2025-12-12T02:34:58 1765506898

But can you do that in one message or is that a best case scenario in a long multi turn chat?

dr_dshiv · 2025-12-12T00:46:48 1765500408

That’s… too bad

eru · 2025-12-12T02:00:26 1765504826

> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

Aeolun · 2025-12-12T02:38:58 1765507138

Don’t think you are missing anything. I do this with the API, and it works great. I’m not sure why they don’t do it, but I can only guess it’s because it completely breaks the context caching. If you summarize the full buffer at least you know you are down to a few thousand tokens to cache again, instead of 100k tokens to cache again.

eru · 2025-12-12T08:28:27 1765528107

> [...] but I can only guess it’s because it completely breaks the context caching.

Yes, but you only re-do this every once in a while? It's a constant factor overhead. If you essentially feed the last few thousand tokens, you have no caching at all (and you are big enough that this window of 'last few thousand tokens' doesn't get you the whole conversation)?

gunalx · 2025-12-11T23:39:25 1765496365

API use was not merged in this way.

freedomben · 2025-12-11T22:28:27 1765492107

I haven't done a ton of testing due to cost, but so far I've actually gotten worse results with xhigh than high with gpt-5.1-codex-max. Made me wonder if it was somehow a PEBKAC error. Have you done much comparison between high and xhigh?

dudeinhawaii · 2025-12-11T22:55:54 1765493754

This is one of those areas where I think it's about the complexity of the task. What I mean is, if you set codex to xhigh by default, you're wasting compute. IF you're setting it at xhigh when troubleshooting a complex memory bug or something, you're presumably more likely to get a quality response.

I think in general, medium ends up being the best all-purpose setting while high+ are good for single task deep-drive. Or at least that has been my experience so far. You can theoretically let with work longer on a harder task as well.

A lot appears to depend on the problem and problem domain unfortunately.

I've used max in problem sets as diverse as "troubleshooting Cyberpunk mods" and figuring out a race condition in a server backend. In those cases, it did a pretty good job of exhausting available data (finding all available logs, digging into lua files), and narrowing a bug that every other model failed to get.

I guess in some sense you have to know from the onset that it's a "hard problem". That in and of itself is subjective.

wahnfrieden · 2025-12-12T03:51:31 1765511491

You should also be making handoffs to/from Pro

robotswantdata · 2025-12-11T23:12:14 1765494734

For a few weeks the Codex model has been cursed. Recommend sticking with 5.1 high , 5.2 feels good too but early days

tekacs · 2025-12-11T22:55:05 1765493705

I found the same with Max xhigh. To the point that I switched back to just 5.1 High from 5.1 Codex Max. Maybe I should’ve tried Max high first.

lhl · 2025-12-12T06:43:32 1765521812

Anecdotally, I will say that for my toughest jobs GPT-5+ High in `codex` has been the best tool I've used - CUDA->HIP porting, finding bugs in torch, websockets, etc, it's able to test, reason deeply and find bugs. It can't make UI code for it's life however.

Sonnet/Opus 4.5 is faster, generally feels like a better coder, and make much prettier TUI/FEs, but in my experience, for anything tough any time it tells you it understands now, it really doesn't...

Gemini 3 Pro is unusable - I've found the same thing, opinionated in the worst way, unreliable, doesn't respect my AGENTS.md and for my real world problems, I don't think it's actually solved anything that I can't get through w/ GPT (although I'll say that I wasn't impressed w/ Max, hopefully 5.2 xhigh improves things). I've heard it can do some magic from colleagues working on FE, but I'll just have to take their word for it.

tgtweak · 2025-12-12T03:02:25 1765508545

have been on 1M context window with claude since 4.0 - it gets pretty expensive when you run 1M context on a long running project (mostly using it in cline for coding). I think they've realized more context length = more $ when dealing with most agentic coding workflows on api.

Workaccount2 · 2025-12-12T03:39:33 1765510773

You should be doing everything you can to keep context under 200k, ideally even 100k. All the models unwind so badly as context grows.

patates · 2025-12-12T06:03:35 1765519415

I don't have that experience with gemini. Up to 90% full, it's just fine.

tgtweak · 2025-12-15T14:20:16 1765808416

If the models are designed around it, and not resorting to compression to get to higher input token lengths, they don't 'fall off' as they get near the context window limit. When working with large codebases, exhausting or compressing the context actually causes more issues since the agent forgets what was in the other libraries and files. Google has realized this internally and were among the first to get to 2M token context length (internally then later released publicly).

BrtByte · 2025-12-12T15:45:56 1765554356

This is one of those updates where the value only really shows up if you're already deep in the weeds

nathants · 2025-12-12T05:14:45 1765516485

Usable input limit has not changed, and remains 400 - 128 = 272. Confirmed by looking for any changes in codex cli source, nope.

Suppafly · 2025-12-12T04:32:46 1765513966

>Can I just say !!!!!!!! Hell yeah!

...

>THANK YOU!!

Man you're way too excited.

twisterius · 2025-12-11T22:07:27 1765490847

Sorry to use your comment as an example but it’s become impossible to distill sponsored testimonials from real commentary.

Maybe it’s the rise of vibe coders coupled with AI slop ruining our already dead internet.

I long for the days where forums honestly discussed problems and solutions pathways.

Maybe I’m just the unc who doesn’t get kids these days. But comments like these make me worry that “builders” are spending too much time consuming.

mmaunder · 2025-12-11T23:28:22 1765495702

My name is Mark Maunder. Not the fisheries expert. The other one when you google me. I’m 51 and as skeptical as you when it comes to tech. I’m the CTO of a well known cybersecurity company and merely a user of AI.

Since you critiqued my post, allow me to reciprocate: I sense the same deflector shields in you as many others here. I’d suggest embracing these products with a sense of optimism until proven otherwise and I’ve found that path leads to some amazing discoveries and moments where you realize how important and exciting this tech really is. Try out math that is too hard for you or programming languages that are labor intensive or languages that you don’t know. As the GitHub CEO said: this technology lets you increase your ambition.

bgwalter · 2025-12-12T00:36:17 1765499777

I have tried the models and in domains I know well they are pathetic. They remove all nuance, make errors that non-experts do not notice and generally produce horrible code.

It is even worse in non-programming domains, where they chop up 100 websites and serve you incorrect bland slop.

If you are using them as a search helper, that sometimes works, though 2010 Google produced better results.

Oracle dropped 11% today due to over-investment in OpenAI. Non-programmers are acutely aware of what is going on.

muppetman · 2025-12-12T02:40:03 1765507203

Exactly this. It's like reading the news! It seems perfectly fine until a news article in a domain you have intimate knowledge of, and then you realise how bad/hacked together the news is. AI feels just like that. But AI can improve, so I'm in the middle with my optimism.

jfreds · 2025-12-12T01:23:24 1765502604

> they remove all nuance

Said in a sweeping generalization with zero sense of irony :D

jrflowers · 2025-12-12T05:43:59 1765518239

This is a good point. It is a sweeping generalization if you do not read the sentence that comes before that quote

re-thc · 2025-12-12T06:13:03 1765519983

> Oracle dropped 11% today due to over-investment in OpenAI

Not even remotely true. Oracle is building out infrastructure mostly for AI workloads. It dropped because it couldn’t explain its financing and if the investment was worth it. OpenAI or not wouldn’t have mattered.

what-the-grump · 2025-12-12T01:50:17 1765504217

You pretend that humans don’t produce slop?

I can recognize the short comings of AI code but it can produce a mock or a full blown class before I can find a place to save the file it produced.

Pretending that we are all busy writing novelty and genius is silly, 99% are writing for CRUD tasks and basic business flows, the code isn’t going to be perfect it doesn’t need to be but it will get the job done.

All the logical gotchas of the work flows that you’d be refactoring for hours are done in minutes.

Use pro with search… are it going to read 200 pages of documentation in 7 minutes come up with a conclusion and validate it or invalidate it in another 5? No you still trying accept the cookie prompt on your 6th result.

You might as well join the flat earth society if you still think that AI can’t help you complete day to day tasks.

jacquesm · 2025-12-12T02:38:30 1765507110

[flagged]

mmaunder · 2025-12-12T03:39:54 1765510794

That's like telling a pig to become a pork producer.

GolfPopper · 2025-12-12T01:38:17 1765503497

Replace 'products' with 'message', 'tech' with 'religion' and 'CEO' with 'prophet' and you have a bog-standard cult recruitment pitch.

Aeolun · 2025-12-12T02:41:10 1765507270

Because most recruitment pitches are the same regardless of the subject.

bluefirebrand · 2025-12-12T01:24:23 1765502663

> I’d suggest embracing these products with a sense of optimism until proven otherwise

They prove me otherwise literally every time I try

That's why I think you are all full of shit

eru · 2025-12-12T02:03:33 1765505013

Maybe you are holding it wrong?

Contemporary LLMs still have huge limitations and downsides. Just like hammer or a saw has limitations. But millions of people are getting good value out of them already (both LLMs and hammers and saws). I find it hard to believe that they are all deluded.

skydhash · 2025-12-12T03:09:25 1765508965

What limitations does an hammer have if the job is hammering? Or a saw with sawing? Even `ed` doesn't have any issue with editing text files.

eru · 2025-12-12T08:26:33 1765527993

Well, ask the people who invented better hammers or better saws. Or better text editors than ed.

mmaunder · 2025-12-11T21:48:08 1765489688

Yeah I eventually noped out as I said in another comment and am charging hard with Codex and am so happy about 5.2!!