The New Butler design, as laid out in a design outline and architecture notes, is a daunting thing to attack, particularly for someone new to the Stack and even astronomy pipelines. It’s finally getting through to me that it’s likely more effective for @n8pease to begin by adding much-desired features to the Old Butler, even if clunky because of the Old Butler’s underpinnings, before starting the transformation to the new one. This topic is an attempt to get input on what people think should be those first “feet-wetting” features.
Ideas for things that should be relatively easy to implement]:
- Dataset type aliases.
- Custom
Mapper
subclass for Firefly cache access (could be related to a single-file-repositoryMapper
that might be useful forprocessFile
). DM-4167 - Provenance recording.
- Repository versioning and selection (for calibrations,
cameraGeom
, and other bitemporal datasets; probably also useful for reference catalogs likeastrometry_net_data
). DM-4168 - Support for database-query dataset types. DM-4169
I think these features from Jim and Robert’s requests from nearly two years ago, while highly desirable and even motivating for the New Butler in the first place, are off the table as too complex for a starter project:
- Config-in-repository and Task-defined output dataset types (and PAF replacement)
- DM-4170 Butler: move configuration (.paf) file into repository
- DM-4171 Butler: change configuration from .paf to something else
- DM-4173 Butler: add support for write-once-compare-same outputs
- DM-4180 Butler: provide API so that a task can define the output dataset type
- Registry-free glob-based repositories.
- ORM-ish aggregates