Yes but they both target horizontial scale out and OLTP is a target for both. An...

qaq · on July 31, 2018

Citus claims OLTP but for a large cluster 2 phase commit is not a very viable approach.

mslot · on Aug 1, 2018

I don't think you can get around using 2PC in a distributed OLTP database, e.g. Spanner also uses 2PC for distributed transactions across shards. Fortunately, the overhead is not really that high because the prepare and commit messages are sent to all nodes in parallel. It only adds one extra network round trip. It does lower per session throughput a bit, but you can always get better throughput by creating more sessions.

mslot · on Aug 1, 2018

> Spanner runs consensus for each key range and does not need all nodes to be available to make progress also my understanding since again it has a leader for each key range writes scale better.

Spanner uses Paxos (consensus) for replication within a key range (shard), but two-phase commit across shards: https://ai.google/research/pubs/pub39966

Citus relies on PostgreSQL's streaming replication, which gives higher throughput than Paxos, but Paxos has better availability characteristics. On the other hand, Paxos with leader leases as used by Spanner is similar to streaming replication both in terms of performance characteristics and short downtime during failover.

qaq · on Aug 1, 2018

It's not similar in infrastructure requirements though Spanner does not have 1/2 the nodes in standby mode for HA they are actually doing work.

qaq · on Aug 1, 2018

Spanner runs consensus for each key range and does not need all nodes to be available to make progress also my understanding since again it has a leader for each key range writes scale better.

elvinyung · on Aug 1, 2018

It's not pure 2PC, it's 2PC on a subset of shards (layered on top of Raft). If it's true that most workloads are primary-key-based or touches a small amount of shards, it's fine.

qaq · on Aug 1, 2018

Could you point to info about them using RAFT? "In Citus, we looked into the 2PC algorithm built into PostgreSQL. We also developed an experimental extension called pg_paxos. We decided to use 2PC for two reasons. First, 2PC has been used in production across thousands of Postgres deployments. Second, most Citus Cloud and Enterprise deployments use streaming replication behind the covers. When a node becomes unavailable, the node’s secondary usually gets promoted within a seconds. This way, Citus can have all participating nodes be available most of the time."

elvinyung · on Aug 1, 2018

Oops sorry, thought you meant Cockroach!

The part about 2PC still holds.

manigandham · on July 31, 2018

what do you mean?

qaq · on Aug 1, 2018

You need all worker nodes to be available for 2pc to succeed. So the solution you have a standby for each node which is not very viable for a large cluster.

manigandham · on Aug 1, 2018

Yes, the fact that Postgres is not natively distributed means that HA/replication will be very inefficient to implement.

Citus is best used when transactions don't cross shard boundaries, in which case they execute on a single node and give you the low-latency to match.