I find this topic difficult to reason about because I'm not intimately familiar with DBs at scale.
That being said: my understanding is we're always going to have something that needs to maintain its own state that's global, and you're naming that problem as well.
For example, let's say we partition users based on the first letter of their email addresses.
This works great for most user-specific queries (e.g., fetching a user profile).
But what happens when someone registers a new account?
At that point, we must ensure the email is globally unique.
A purely partitioned approach won't help here—we'll need some kind of global database or service maintaining a single source of truth for email uniqueness checks.
(then it gets complicated, because of the simple level at which I can understand and communicate about it. Why not just partition based on the first letter of an email? Well, yes, then we just have to deal with emails changing. Maybe a better example is session tokens, because they don't come with an email. But we could require that, or do some bespoke thing...there's security concerns there but they seem overrated...but to your point, you end up adding a ton of complexity just so you can fit a square peg in a round hole)
Do you mind elaborating why a db partitioned like that is not enough for your registration example? If the partitioning is based on the email address, then you know where the new user's email has to be if exists, you don't need to query all partitions.
For example, following your partitioning logic, if the user registers as john.smith@example.com, we'd need to query only partition j.
You're right, the email address example isn't clearcut -- its not an issue at all at registration. From there, you could never allow an email change. Or you could just add a layer for coordination, ex. we can imagine some global index that's only used for email changes and then somehow coordinates the partition change
My broad understanding is that you can always "patch" or "work around" any single objection to partitioning or sharding—like using extra coordination services, adding more layers, or creating special-case code.
But each of these patches adds complexity, reduces flexibility, and constrains your ability to cleanly refactor or adapt later. Sure, partitioning email addresses might neatly solve registration checks initially, but then email changes require extra complexity (such as maintaining global indices and coordinating between partitions).
In other words, the real issue isn't that partitioning fails in a single obvious way—it usually doesn’t—but rather that global state always emerges somewhere, inevitably. You can try to bury this inevitability with clever workarounds and layers, but eventually you find yourself buried under a mountain of complexity.
At some point, the question becomes: are we building complexity to solve genuine problems, or just to preserve the appearance that we're fully partitioned?
(My visceral objection to it is, coming from client-side dev virtually my entire career: if you don't need global state, why do you have the server at all? Just give use a .sqlite for my account, and store it for me on S3 for retrieval at will. And if you do need global state...odds are you or a nearby experienced engineer has Seen Some Shit, i.e. the horror that arises in a codebase worked on over years, doubling down on an seemingly small, innocuous, initial decision. and knows it'll never just be one neat design decision or patch)
Check the other partition for the user name. Create the new user with the same pointer (uuid, etc) to the user’s sqlite file, delete the old user in the other partition. Simple user name changed. Not really that complex to be honest. (After thinking this through I’m probably going to suggest us changing to sqlite at work…)
> if you don't need global state, why do you have the server at all?
2 reasons I can think of right off of the top of my head are:
- validation (preventing bad actors, or just bad input)
FWIW, I‘ve seen consensus here on HN in another thread on SQLite-on-server, that there must indeed be a central DB for metadata (user profiles, billing etc), and all the rest is then partitioned.
That being said: my understanding is we're always going to have something that needs to maintain its own state that's global, and you're naming that problem as well.
For example, let's say we partition users based on the first letter of their email addresses.
This works great for most user-specific queries (e.g., fetching a user profile).
But what happens when someone registers a new account?
At that point, we must ensure the email is globally unique.
A purely partitioned approach won't help here—we'll need some kind of global database or service maintaining a single source of truth for email uniqueness checks.
(then it gets complicated, because of the simple level at which I can understand and communicate about it. Why not just partition based on the first letter of an email? Well, yes, then we just have to deal with emails changing. Maybe a better example is session tokens, because they don't come with an email. But we could require that, or do some bespoke thing...there's security concerns there but they seem overrated...but to your point, you end up adding a ton of complexity just so you can fit a square peg in a round hole)