> Wes McKinney Sold For those who don't know. Wes McKinney is the creator of Pan...

snthpy · 2025-10-02T05:07:41 1759381661

Andy Pavlo also deserves a shout out; he's an authority on databases and lives a "data oriented lifestyle". See his two "What goes around comes around ..." papers for great overviews of the past and future 50 years in databases respectively (as well as all the amazing CMU Seminar Series). I'm excited to see him involved in this.

My apologies to the Chinese coauthors who I'm not familiar with.

Bookmarked for thorough reading!

whage · 2025-10-02T08:11:25 1759392685

I've been looking for an article that I think I found 10+ years ago on the web about a very similar topic, something like "the evolution of database systems". It was a very well written article that I just skimmed through and planned to read properly but could never find it again. I remember it had hand-drawn-style diagrams of database architectures with blue backgrounds, sometimes squiggly lines and maybe even some bees on them (could be just my mind mixing it with something). I'd be eternally grateful if someone here could find it.

snthpy · 2025-10-02T08:53:15 1759395195

Not sure if this is what you had in mind but here are the links to the two papers that I was referencing.

[1] covers the first 40 years of databases. [2] fills in the gap of the last 20 years and gives their thoughts on the future.

My apologies, the first one was actually Stonebraker & Hellerstein and didn't involve Pavlo. They're both excellent papers though for anyone working with data.

Stonebraker, for those who don't know, is the creator of Postgres, a database you might have heard of.

1: Stonebraker & Hellerstein, "What Goes Around Comes Around", 2005, https://people.csail.mit.edu/tdanford/6830papers/stonebraker...

2: Stonebraker & Pavlo, "What Goes Around Comes Around... And Around...", 2024, https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec20...

noshitsherlock · 2025-10-02T22:36:30 1759444590

Deepseek -> ?? https://martin.kleppmann.com/2015/03/04/turning-the-database... and https://speakerdeck.com/ept/transactions-myths-surprises-and...

snthpy · 2025-10-02T04:53:33 1759380813

I'm a big fan of Wes' work and Pandas was incredibly influential. However technically it wasn't his best work. In terms of selling points, I think that the Arrow data format is technically much better and more influential on the data ecosystem as a whole, see DataFusion, etc...

That said, now let me see what F3 is actually about (and yes, your comment is what actually made me want to click through to the link) ...

wodenokoto · 2025-10-02T04:52:13 1759380733

His work on parquet probably stands out as a better call to authority

geodel · 2025-10-02T04:52:33 1759380753

Also creator of Apache Arrow. A core component of modern data analytics.

nialse · 2025-10-02T04:33:49 1759379629

Mixing data and code is a classic security mistake. Having one somewhat known individual involved doesn’t magically make it less of a mistake.

snthpy · 2025-10-02T05:11:46 1759381906

I was also concerned about the wasted overhead. However I guess it's just there for compatibility (since space is cheap) and for common encodings you'll be able to skip reading it with range requests and use your trusted codec to decode the data. Smart move imho.

nialse · 2025-10-02T07:03:48 1759388628

I’m not concerned about the overhead, there is always more and larger pieces of iron. Still not a good idea to mix executable code with data.

chme · 2025-10-02T10:04:28 1759399468

It really depends on the order of priorities. If the overall goal is to allow digital archeologist to make sense of some file they found, it would be prudent to give them some instructions on how it is decoded.

I just hope that people will not just execute that code in an unconfined environment.

nialse · 2025-10-02T15:03:50 1759417430

Hope is not an adequate security best practice. ;)