Thanks for the advice. FWIW, though, TimescaleDB supports multi-dimensional partitioning, so a specific "hot" time interval is actually typically split across many chunks, and thus server instances. We are also working on native chunk replication, which allows serving copies of the same chunk out of different server instances.
Apart from these things to mitigate the hot partition problem, it's usually a good thing to be able to serve the same data to many requests using a warm cache compared to having many random reads that thrashes the cache.
Hey Erik, thanks for the post. In this vision, would this cluster of servers be reserved exclusively for timeseries data, or do you imagine it containing other ordinary tables as well?
We're using postgres presently for some IoT, B2B applications, and the timeseries tables are a half dozen orders of magnitude larger than the other tables in our application. Certain database operations, like updates, take a very long time because of this. I've wondered if by splitting the timeseries tables onto their own server I could handle updates independently, with the main app gracefully handling the timeseries DB being offline for some period of time.
It's more than just about downtime though. If through poor querying or other issues the timeseries db is overloaded the customer impact of the slow down would be limited.
We commonly see hypertables (time-series tables) deployed alongside relational tables, often because there exists a relation between them: the relational metadata provides information about the user, sensor, server, security instrument that is referenced by id/name in the hypertable.
So joins between these time-series and relational tables are often common, and together these serve the applications one often builds on top of your data.
Now, TimescaleDB can be installed on a PG server that is also handling tables that have nothing to do with its workload, in which case one does get performance interference between the two workloads. We generally wouldn't recommend this for more production deployments, but the decision here is always a tradeoff between resource isolation and cost.
Thanks for the advice. FWIW, though, TimescaleDB supports multi-dimensional partitioning, so a specific "hot" time interval is actually typically split across many chunks, and thus server instances. We are also working on native chunk replication, which allows serving copies of the same chunk out of different server instances.
Apart from these things to mitigate the hot partition problem, it's usually a good thing to be able to serve the same data to many requests using a warm cache compared to having many random reads that thrashes the cache.