Following is a cut at how we may try to bring the obs packages under control. This is not meant to make the obs packages “right” just to make them more homogeneous and easier to implement from scratch. I’m hoping for lively conversation to allow us to crystalize on a new design we can implement as a focused hack week later in the cycle.
What do obs packages currently contain?
-
Calibration information (Not including calibration images)
- Linearity
- Defects
- Electronics (gain, read noise, overscan region, serial numbers, etc.)
- Camera geometry
- Crosstalk
-
Instrument specific data manipulation tools
- E.g. Native defect format -->
DefectList
- E.g. Native defect format -->
-
Instrument specific task configuration overrides
-
Instrument specific task subclasses
IngestTask
IngestCalibTask
IsrTask
-
CameraMapper
subclass, specificallymap
,std
,bypass
methods. -
Dataset definitions
Issues
- The
std_*
,map_*
andbypass_*
functions in theCameraMapper
are documented in theCameraMapper
class, but not in the subclasses. This leads to cargo culting of possibly incorrect usage. - Calibration primacy and reproduceability is not obvious. It is not always clear what should be used for calibrations or where the calibration data came from originally. There’s also the question of how to keep code and calibrations up to date with each other.
- Conflation of calibration information with code configuration is a problem because they change on different time scales and because one is a function of the data acquisition and the other is closer to a runtime decision.
- The
Mapper
is in limbo in the sense that it doesn’t belong concretely in either the DAX team or SciPi team sphere of responsibility. - Ad hoc treatment: e.g. each obs package is using a different mechanism to transform calibration information from native format to the format needed by the stack.
- The bi-temporal problem – There is no way currently to specify any combination of calibration products and code to apply the products: i.e. “reduce data as if it was 1995” and “rereduce data taken in 1995 with the latest and greatest” are the two extremes.
Proposal
- Split current obs packages into two git repositories each
- Calibration repository: This will be a git(-lfs) repository containing all calibration data. The repository will also contain code and tests to allow generation of the calibration repository at
scons
time. - Configuration repository: This will be a git repository of largely configuration information: e.g. dataset definitions, config overrides,
Mapper
subclasses. TBD is where the raw data ingest task overrides live. They could find a home in either repository.
- Provide defined mechanisms for manipulating and ingesting calibration data.
- Document clearly the non-calibration information. We should provide a cookbook for how to generate an obs package. This means clearly documenting which pieces are commonly (or necessarily) overridden.
Calibration part
- all calibration-like data in native format goes into a git repository specifically for holding these data.
- the calibration repository is built at scons time from the data in native format to solve the primacy issue
- discoverability is handled by valid date ranges in the calibration repository
- the calibration repository will be append only: i.e all versions of the calibration products will exist in the repo.
- The bi-temporal problem is naturally addressed by this design. At any time, a calibration repository of the entire history of the calibration products can be generated from the native formats. Git tags will need to be used to keep track of changes in how the calibrations are applied by e.g.
ip_isr
. -
obs_base
will provide an ABCTask
that will have the methods necessary for building the calibration repository. This may require coming up with a way to map calibs to valid ranges.
class BuildCalibRepoTask(object):
def run(self):
self.make_defects.run()
self.ingest_defects.run()
self.make_linearity.run()
self.ingest_linearity.run()
...
Note We could add the image like calibration data via multiple parents.
Non-calibration part
This is mostly documentation.
- Document what the “magic” methods do and how to use them.
- Move as many dataset definitions to
obs_base
and purge those not needed - Document the process of subclassing the ingest tasks
- Identify common config overrides. Document required config overrides.
- Document required
VisitInfo
attributes. This will involve a bit of policy making. I.e. what to do when a needed piece ofVisitInfo
is missing for a particular algorithm. This policy should be enforced in code where possible.
Links
https://jira.lsstcorp.org/browse/RFC-341
https://jira.lsstcorp.org/browse/DM-4624
I’m sure there are more…