But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down ...

piskov · 2025-12-11T23:57:59 1765497479

Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

energy123 · 2025-12-12T02:34:58 1765506898

But can you do that in one message or is that a best case scenario in a long multi turn chat?

dr_dshiv · 2025-12-12T00:46:48 1765500408

That’s… too bad

eru · 2025-12-12T02:00:26 1765504826

> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

Aeolun · 2025-12-12T02:38:58 1765507138

Don’t think you are missing anything. I do this with the API, and it works great. I’m not sure why they don’t do it, but I can only guess it’s because it completely breaks the context caching. If you summarize the full buffer at least you know you are down to a few thousand tokens to cache again, instead of 100k tokens to cache again.

eru · 2025-12-12T08:28:27 1765528107

> [...] but I can only guess it’s because it completely breaks the context caching.

Yes, but you only re-do this every once in a while? It's a constant factor overhead. If you essentially feed the last few thousand tokens, you have no caching at all (and you are big enough that this window of 'last few thousand tokens' doesn't get you the whole conversation)?

gunalx · 2025-12-11T23:39:25 1765496365

API use was not merged in this way.