Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Wes McKinney

Sold

For those who don't know. Wes McKinney is the creator of Pandas, the go-to tabular analysis library for Python. That gives his format widespread buy-in from the outset, as well as a couple of decades' of Caring About The Problem which makes his insights unusually valuable.



Andy Pavlo also deserves a shout out; he's an authority on databases and lives a "data oriented lifestyle". See his two "What goes around comes around ..." papers for great overviews of the past and future 50 years in databases respectively (as well as all the amazing CMU Seminar Series). I'm excited to see him involved in this.

My apologies to the Chinese coauthors who I'm not familiar with.

Bookmarked for thorough reading!


I've been looking for an article that I think I found 10+ years ago on the web about a very similar topic, something like "the evolution of database systems". It was a very well written article that I just skimmed through and planned to read properly but could never find it again. I remember it had hand-drawn-style diagrams of database architectures with blue backgrounds, sometimes squiggly lines and maybe even some bees on them (could be just my mind mixing it with something). I'd be eternally grateful if someone here could find it.


Not sure if this is what you had in mind but here are the links to the two papers that I was referencing.

[1] covers the first 40 years of databases. [2] fills in the gap of the last 20 years and gives their thoughts on the future.

My apologies, the first one was actually Stonebraker & Hellerstein and didn't involve Pavlo. They're both excellent papers though for anyone working with data.

Stonebraker, for those who don't know, is the creator of Postgres, a database you might have heard of.

1: Stonebraker & Hellerstein, "What Goes Around Comes Around", 2005, https://people.csail.mit.edu/tdanford/6830papers/stonebraker...

2: Stonebraker & Pavlo, "What Goes Around Comes Around... And Around...", 2024, https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec20...



I'm a big fan of Wes' work and Pandas was incredibly influential. However technically it wasn't his best work. In terms of selling points, I think that the Arrow data format is technically much better and more influential on the data ecosystem as a whole, see DataFusion, etc...

That said, now let me see what F3 is actually about (and yes, your comment is what actually made me want to click through to the link) ...


His work on parquet probably stands out as a better call to authority


Also creator of Apache Arrow. A core component of modern data analytics.


Mixing data and code is a classic security mistake. Having one somewhat known individual involved doesn’t magically make it less of a mistake.


I was also concerned about the wasted overhead. However I guess it's just there for compatibility (since space is cheap) and for common encodings you'll be able to skip reading it with range requests and use your trusted codec to decode the data. Smart move imho.


I’m not concerned about the overhead, there is always more and larger pieces of iron. Still not a good idea to mix executable code with data.


It really depends on the order of priorities. If the overall goal is to allow digital archeologist to make sense of some file they found, it would be prudent to give them some instructions on how it is decoded.

I just hope that people will not just execute that code in an unconfined environment.


Hope is not an adequate security best practice. ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: