The long-term shared Gen3 data repositories at NCSA described by RFC-741 and DMTN-167 are now up and ready for users (at least I think so, and it’s time to test that assertion).
They are not yet complete, but there is enough there already that some at least users should be able to switch their daily work to these data repositories. I also have plenty of documentation work to do before we can declare RFC-741 done, but I figure it’s better to open up these now than to block usage on documentation.
All of these repos have u/<user>
subdirectories (corresponding to u/<user>
collections) that users can write to, with other directories read-only for everyone except me and (in one case) @madamow. I can open up other directories to others as we go; I certainly expect to for e.g. calibrations. And hopefully I’ll eventually have time to put some database-side guardrails on database-only entities; for now, just continue to be careful to delete or modify only collections that start with u/<user>
.
The data repositories and their current status and contents are outlined below, but please feel free to poke around; I highly recommend
$ butler query-collections /repo/main --chains=tree | less
/repo/main
This data repository will hold essentially all data from all real instruments (the only known exception is what’s in /repo/ccso
).
Right now it has essentially all of the HSC data at NCSA - PDR2 and a few special programs (a small amount of calibration data failed to ingest for reasons that have been diagnosed but not fixed). There are two suites of master calibrations (HSC/calib/gen2/20180117
and HSC/calib/gen2/20200115
), with the former marked as the default (it was in Gen2) and usable via just HSC/calib
(aside: should the later calibration suite actually be the default?).
There are also special collections for the heavily used RC2 subset; using HSC/RC2/defaults
as your input collection should cover everything you need for regular DRP processing, and if you’re processing the whole thing, it should make passing a visit constraint on the command-line unnecessary.
The converted w_2021_02
and w_2021_06
Gen2 RC2 runs are present as well - I’ll do w_2021_10
shortly, now that (I think) it’s done.
Eventually /repo/main
will include data from LSST hardware and DECam as well; the former will be ingested after the filesystem reorganization scheduled for Thursday morning, and I’ll get started on ingesting DECam data later this week as well.
/repo/dc2
This includes all of the DESC DC2 DR6 WFD raws and processing that will be used in DP0.1 (it’s a clone of the IDF repo - or rather the converse). That includes the original calibs used for the DESC processing. The dataset processed approximately monthly by DM is a subset of this, and there are two TAGGED
collections that contain the raws for these important subsets:
-
2.2i/raw/DP0
: raws for DP0 (currently everything, but this will stay the same even if we add more raws) -
2.2i/raw/test-med-1
(DM monthly processing subset)
The full DESC-run DR6 WFD processing can be found in the 2.2i/runs/DP0.1
collection.
/repo/teststand
This contains raw data from the simulated NCSA teststands.
/repo/ccso
This is where alternate versions of LSST raws (written by the CCS with controller=‘O’) will land. At least I think that’s what it is; I’m confident that the people who care about this repository already know more about its contents than I do. So far it just contains instrument registrations.