On RFC-775, @natelust and I proposed - without a lot of detail - having the new
drp_pipe package separate Pipeline source files (in
ingredients directories) from their expanded forms (in a
pipelines directory), which would both be more readable and something more suitable for actual execution. Despite being built by the build system, the expanded forms in the
pipelines directory would be committed to git, because that enables a few very nice things:
Users could inspect the pipelines in their most readable form via the GitHub web interface.
Production operators could directly execute (via
ButlerURIand GitHub’s raw-file access URIs) expanded, fully-configured pipelines that is are protected as possible from accidental configuration overrides (especially if we bake a multi-package software version hash into those files).
The task and configuration changelog of each pipeline would naturally appear as the git history of those expanded files.
Committing build artifacts to version control is always a bit problematic, but I accepted the RFC without a detailed plan because I wanted to experiment a bit in the implementation to see if we could make it work because of those upsides.
I’ve now done that experimentation (on DM-30891), and I think it’s time to give up on the idea, at least in the form proposed on the RFC. Big thanks to @kfindeisen and @mrawls for helpful feedback on the ticket that saved me from going down a few more dead-ends before arriving at that conclusion.
What that means for
drp_pipe is that we’ll remove the
recipes directory and put its pipeline “source” definitions in the
pipelines directory. For at least most development purposes, those (non-expanded) pipelines are what we’ll run - expansion will happen on-the-fly, as it does with our obs_* package pipelines today. That’s also good because it’s basically what
ap_pipe already does - though I do want to keep the
ingredients directory in
drp_pipe as a place to put pipeline source content that should not be directly run its own, and I think it’s fine if
ap_pipe doesn’t ever need to have
ingredients; this may just reflect the fact that the DPR pipeline has many more tasks.
That said, those motivations for git-committing the expanded pipelines still stand, and in a follow-up discussion, @natelust and I came up with some ideas for achieving them in other ways.
The expanded pipeline files really are more readable, and they will get more (relatively) readable as the pipeline source files are refactored in the future to remove duplication. But if the goal is to let humans read them on the web, we don’t need to use GitHub: we can get them into our Sphinx doc builds instead. This will require some tooling support, but I can imagine all kinds of wonderful interactive navigation that I am not remotely capable of implementing myself:
- variants of the same pipeline (e.g. for different instruments or test datasets) in tabs;
- expanding/collapsing blocks for the configuration associated with each task (maybe even with differences from the task-level defaults highlighted somehow);
- views of the pipeline as a graph (something we can already generate via GraphViz);
- links to the schemas of catalogs produced by these tasks.
There are a fair number of very long-standing tickets requesting pieces of this, and I think we can do a number of them at once if we can start by running pipeline expansion inside the
pipelines.lsst.io build and start extracting things from it into rST (or even directly into HTML).
The big problem with committing expanded pipelines to
drp_pipe is that the content of those pipelines depends on all of
drp_pipe's dependencies. This means a change in one of those upstream packages can easily break the equivalency between the pipeline source we’d planned to put in
recipes and the expanded forms in
pipelines, without any change (or opportunity to commit) to
A separate git repository for expanded pipelines whose commits are only machine generated - say, from the same services that produce daily and weekly builds - would not suffer from this problem, however, and it opens up some new possibilities:
In addition to expanded pipelines, we could also record (for each commit, mapping to a particular release of the Science Pipelines):
- the release tag for the stack and/or the git commit refs of all dependencies (maybe even via git submodules);
- the Jira ticket numbers merged since the last commit to this “changelog” repo (extracted by parsing git commit logs);
- schema files and other init-output datasets produced by those expanded pipelines.
That would make this a souped-up version of the already super-useful informal changelog:
- you could
git cloneit locally and use whatever tools you like to inspect/explore the history (
- you could use it to directly relate pipeline content and configuration changes to Jira tickets and versions;
- with a bit of extra tooling, you could use it to setup, install, or build a pipelines version associated with a particular changelog commit.
It may be that some pretty simple web pages backed by this repo could be better as a way to display pipeline content than integration with the sphinx doc build, too, to the extent that we may not want to bother with the doc integration from the last section, if we can get this stood up quickly instead (and link to it from the regular docs, of course).
There may also be fun things one could do with connections to SQuaSH (any metric we upload should probably come from a pipeline that’s exactly represented in this changelog repo) or build-system experimentation (a submodule view of the packages seems like a nice thing to hang a monolithic, eups-free CMake build on…), but now I’m getting ahead of myself.
Anyhow, I’m not quite sure who to talk to about getting something like this up and running, but I bet somebody who reads this will have some ideas. I should also add that my main concern with this idea is that it may partially duplicate a lot of things that already exist that I don’t know much about (e.g. the lsstsw versiondb repo, the new schema browser). Please chime in if you see this as conflicting with existing ways of doing things.