Qserv data placement and replication strategies.
Goal is to provide a sharp requirements list for Data-Placement. These requirements are critical in order to offer an efficient and well-fitted solution.
Here’s an initial proposal:
Maximize high availability
SQL user queries must continue in real time, if less than 2% worker nodes crashes.
Minimize overall cost
Costs include infrastructure, sofware development, system administration and maintenance.
Minimize data reconstruction time
A node which crashes must be cloned in a minimal amount of time (data and application)
Prevent data loss
At least one instance of each chunk must be available at any time on disk, even in case 5% of storage is definitly lost.
Archive should also be managed by the system in order to reduce cost.
Maximize performances
Bottleneck must be identified and quantified for each proposed architectures.Chunk placement combinations can then be optimized depending on a given query set.