Two questions:
-
will our users typically run analysis across large subset of data that involve Object and Source and ForcedSource at the same time? The reason I ask is that we have tentatively planned to run Object+Source share scan independently from Object+ForcedSource share scan, because it will allow us to run each scan with an optimal speed, but perhaps that unnecessarily limits data analysis. If we would run these two scans independently, users would have to save results from one scan and then join with the other scan, effectively waiting for two scans instead of one.
-
How often will users want to join data across data releases / how easy do we need to make it? We have been tentatively considering serving each DR from a separate cluster / through a separate Qserv, but if users will frequently try to cross match across different data releases, perhaps we should reconsider.
Do keep in mind that synchronizing everything (Object and Source and ForcedSource and ObjectExtra etc from multiple data releases) would results in each scan running slower, it’s require more resources, plus, it’d complicate Qserv implementation.
Thanks,
Jacek