In thinking about the parallelization and data flow for Data Release Production, it’s occurred to me that it could be very difficult to support processing data on laptops using the full pipeline in the near future. That’s because the pipeline is going to need to process large volumes of data at least somewhat simultaneously - for instance, we’ll be computing full focal-plane solutions for the WCS, PSF, and background model. My guess is that processing a full LSST focal plane will take around 50-60 GB of memory, which isn’t a problem on a big cluster (or even a single node of most new clusters), but it is more than we’ll be able to expect user laptops to have for quite a while. Other stages in processing, such as joint calibration (if tracts are large), background matching, or multifit (on very deep datasets) could be even larger.
I think there are a few ways we can deal with this:
-
Don’t even try to support certain kinds of processing on small machines. This could limit our ability to develop on airplanes and get complete test coverage on personal-developer machines, but I think you can make a case that no one should be trying to really run high-level pipeline code on laptops anyway.
-
Provide alternative pipelines that do lower-quality or less-robust processing than the full pipeline (e.g. by running CCDs in serial and not doing any full-focal-plane fitting), but are otherwise still compatible with downstream dependent pipelines in terms of the outputs they produce. I worry about the added cost of developing two similar-but-not-identical pipelines, even if this is basically what we’d want to do to support SExtractor-style use of the pipeline.
-
Provide a way to run the full pipeline on laptops, but much more slowly (via a lot of swapping to disk). This might require running under very different middleware, and hence I’m worried about the cost of new middleware development we wouldn’t otherwise have to do.
I think I have a preference for (1), but to avoid the biggest downsides of that I think we’ll need to work hard to make it easy to exactly reproduce small pieces of the pipeline from intermediate outputs (something we’ve at least vaguely referred to as “freeze-drying” in the past). That would allow these small pieces to at least be run on developer machines from intermediates produced on big machines. I suspect that even then we’d mostly still prefer to do this sort of development on machines that share a filesystem with the big machines (rather than laptops), but those could be (e.g.) Nebula systems that we can spin up immediately rather than the queue-based big machines themselves.