One problem seems to be that there are quantities which are constant across all rows in an afw table, stored e.g. in the FITS header of its file representation, which ingestCatalogTask
does not load into database columns. So, even if there is a separate metadata table for CCDs or exposures, you may not be able to join between say sources and their originating CCDs, because the information required for the join may not be in source table columns. Another problem seems to be that you don’t know how to create an exposure or CCD metadata database table.
Is that a good summary?
For the first issue, I was assuming that the pipelines would place at least foreign keys (e.g. columns that identify the originating exposure / CCD of a source) into actual columns, even if the value is constant across all source measurements of a particular granularity. If the pipelines don’t already do that, I guess you would like daf_ingest to deal with afw table metadata available via getMetadata()
? I think that’s a reasonable request, but I’m not totally sure how to deal with the corresponding daf::base::PropertyList
. Would it be enough to make the user responsible for specifying a list of desired metadata key names? The task might then complain if any of them don’t exist (alternatively: set the corresponding database value to NULL), and might error out if a key maps to an array of values or a complex type (i.e., an lsst::daf::base::Persistable
).
For the second concern, ingestProcessed should still work (ingestImages task from pipe_tasks appears to be for something else). I haven’t heard complaints of it not working, though that might just be because nobody is using it. My medium term plan is to move it from datarel to daf_ingest. That script currently extracts and derives information from on-disk FITS image headers. I was also planning to change it to consume metadata from in-memory Exposure
objects instead, so that it can be called both after a pipeline run, or as part of pipeline execution (avoiding disk I/O for headers).