Resources for Setting up IDACs

In order to create a knowledge base as In-Kind proposal teams and other groups develop plans for setting up Independent Data Access Centers, I’ve put together a collection of links to resources that might be helpful in developing those plans, most of which are linked within the Guidelines for Rubin IDACs by O’Mullane et al. There is also a Frequently Asked Questions (FAQ) section.

This is a community-editable document, such that if you discover a resource that you think might be helpful to others, please consider adding it to the Shared Knowledge Base Document.

Hi @knutago

Part of the Canadian in-kind proposal is a sort of iDAC-lite, with the main purpose of hosting the public release products, and alerts history. Would you mind confirming for @jjkavelaars and I the following data volume estimates?

  • Public multi-band sky template: 0.6 PB per release
  • Public source catalogs (DIA, and transient): 15 PB after 10 years
  • Alert stream ~2.2 PB (got this from https://dmtn-102.lsst.io/)

Also, can you confirm that the alert stream data volume will be mostly from the pixel data component of the alerts?

Finally, are there any references we should use in our proposals for the first two items?

Thanks very much!

Hi @fraserw and @jjkavelaars,

  • So for the image coadds, Table 2 of RTN-003 has the volume at 0.3 PB per release. I can reproduce this as roughly 18,000 sq. deg * (3,600 arcsec/deg)**2 * (1 pixel/0.2 arcsec)**2 * 10 bytes/pixel * 6 bands ~ 350 TB. There are, however, multiple image coadds that you may possibly be interested in, see p. 29 of LSE-163, which would each contribute ~0.3 PB.
  • For the public source catalogs, do you mean cumulative volume of all DRs? For which specific tables from Table 1 of RTN-003? If cumulative, the total volume of all DRs of the Source table would be ~30 PB, 4 PB of which are in Year 10.
  • The full alert stream DB volume is indeed ~2.2 PB, for 82 kB per alert. 20% of the volume of each alert is from the 30x30 pixel postage stamp of the detection and template. I think there is discussion of creating Lite alerts, @MelissaGraham can you address this?

For the first two items, RTN-003 and LSE-163 are good references, I think.

Thanks for the reply.

In regards to the tables, I am trying to figure out where the 30 PB value comes from. In the table 1, the total catalog volume is ~7 PB for the final. In the last para of page three, 15 PB is quoted, which I interpreted to be the cumulative volume. Both are different than the 30 PB value.

Can you elaborate?

Thanks

Just doing the series sum, I can see that the total cumulative source + DIA tables could be over 30 PB. So I guess the 15 PB quoted in the paragraph is a typo?

I’ve submitted a GitHub issue regarding some potential typos in RTN-003, regarding the catalog sizes – hopefully we can get this resolved asap!

Thanks @MelissaGraham for submitting the issue, and @fraserw for digging into this! I wonder if some of the inconsistency comes from catalog volume vs. database storage, which from Figure 1 includes indices, maintenance space, a replica for fault tolerance, and the full previous DR. Indeed, the wedge for the previous DR is ~30 PB by Year 10.

So perhaps the volume of 15 PB is the ~7 PB from Table 1 in RTN-003 plus indices? Figure 1 implies that storing the full catalog plus the previous DR in year 10 would require 70+ PB of DB storage for a performant DB, even if the catalog itself plus indices might only be 15 PB.

Re. the third bullet point, it’s true that there is an ongoing discussion of creating Lite alerts, so things may change in the future, but 2.2PB remains a reasonable estimate for the alert database volume. (I checked this with Rubin Alert Production staff).

Hi Knut. My confusion continues. If what you suggest is true, that the full DR10 DB is 30 PB is 30 PB, then what does the ~7 PB in table 1 represent?

I understand the confusion, and think it’s best to get clarification from DM through Melissa’s request. But my interpretation of Figure 1 in RTN-003 and the 7/15 PB numbers is that 7 PB is the volume of the tabular data in bytes; 15 PB is the volume of the tabular data plus indices; and 30 PB is the total DB storage including replication for fault tolerance. The previous DR should be a bit less than 30 PB (tables+indices+replication), but it looks to be about that number in the Figure.

1 Like

Good information thanks for sharing
vmware

1 Like