Define visits for a single collection

I am running into issues/unexpected/confusing behavior with the butler define-visits CLI command. It seems that the command attempts to create visits for all exposures in the repository regardless of what is passed with the --collections argument. For my use-case, I’ve ingested all object frames into separate collections based on their night of observation and the target they are pointing to. Here is a sample of my collections:

$ butler query-collections repo | grep "DECam/raw/object/210318" | grep "RUN" 
DECam/raw/object/210318                                 RUN                                                                                                                                                                                                                                                                                                                              
DECam/raw/object/210318/cosmos_1                        RUN                                                                                                                                                                                                                                                                                                                              
DECam/raw/object/210318/cosmos_2                        RUN                                                                                                                                                                                                                                                                                                                              
DECam/raw/object/210318/cosmos_3                        RUN                                                                                                                                                                                                                                                                                                                              

I’d like it if I could define visits on a per-night, per-target basis, so for example by running:

$ butler define-visits repo lsst.obs.decam.DarkEnergyCamera --collections DECam/raw/object/210318/cosmos_3 -j 48

However, I’m finding that running these commands takes a very long time (minutes, even with -j 48). I’ve noticed the following identical outputs when running this command with different collections:

defineVisits INFO: Preprocessing data IDs.
defineVisits INFO: Registering visit_system 0: one-to-one.
defineVisits INFO: Grouping 1321 exposure(s) into visits.
defineVisits INFO: Computing regions and other metadata for 1321 visit(s).

so it looks like the define visits task is gathering all of the data in the repository regardless of the collection names passed. I can also run the define visits command with a non-existing collection and receive the same output before the command fails since the collection does not exist:

$ butler define-visits repo/ lsst.obs.decam.DarkEnergyCamera --collections DECam/raw/object/210308/cosmos_1 -j 48
defineVisits INFO: Preprocessing data IDs.
defineVisits INFO: Registering visit_system 0: one-to-one.
defineVisits INFO: Grouping 1321 exposure(s) into visits.
defineVisits INFO: Computing regions and other metadata for 1321 visit(s).
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 469, in _buildVisitRecordsSingle
    return self._buildVisitRecords(args[0], collections=args[1])
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 363, in _buildVisitRecords
    visitRegion, visitDetectorRegions = self.computeVisitRegions.compute(definition,
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 789, in compute
    detectorBounds = self.computeExposureBounds(visit.exposures[0], collections=collections)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 729, in computeExposureBounds
    camera, versioned = loadCamera(self.butler, exposure.dataId, collections=collections)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/_instrument.py", line 911, in loadCamera
    cameraRef = butler.get("camera", dataId=dataId, collections=collections)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py", line 1008, in get
    ref = self._findDatasetRef(datasetRefOrType, dataId, collections=collections, **kwds)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py", line 787, in _findDatasetRef
    ref = self.registry.findDataset(datasetType, dataId, collections=collections, timespan=timespan)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/_sqlRegistry.py", line 405, in findDataset
    for collectionRecord in collections.iter(self._managers.collections):
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/wildcards.py", line 469, in iter
    manager.find(name),
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/collections/_base.py", line 416, in find
    raise MissingCollectionError(f"No collection with name '{name}' found.")
lsst.daf.butler.registry._exceptions.MissingCollectionError: No collection with name 'DECam/raw/object/210308/cosmos_1' found.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/bin/butler", line 28, in <module>
    sys.exit(main())
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/cli/butler.py", line 321, in main
    return cli()
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/cli/cmd/commands.py", line 94, in define_visits
    script.defineVisits(*args, **kwargs)
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/script/defineVisits.py", line 61, in defineVisits
    task.run(butler.registry.queryDataIds(["exposure"], dataId={"instrument": instr.getName()}),
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 550, in run
    for visitRecords in self.progress.wrap(allRecords, total=len(definitions),
  File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/core/progress.py", line 246, in wrap
    yield from bar
  File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
lsst.daf.butler.registry._exceptions.MissingCollectionError: No collection with name 'DECam/raw/object/210308/cosmos_1' found.

which tells me that all of the data ids in the repository are being gathered before the collection is referenced.

I’ve also found that the butler query-dimension-records command similarly ignores the --collections argument. For example:

butler query-dimension-records repo/ visit --collections DECam/raw/object/210318/cosmos_1

will give me visits defined from all of the exposures ingested, not just those in the collection.

I am guessing that define-visits and query-dimesion-records both work in a way I just don’t expect, gathering data ids from the repository independently of the collections passed, so perhaps no arguments are truly being “ignored”, I just don’t know how to get the behavior I want.

I’d really like it if I could define visits in a quick and incremental manner, i.e. using single collections, since I will be ingesting and processing more data over time. Is there a way for me to do this?

Thanks for the report, and I’m sorry I didn’t find this message sooner. The problem with define-visits is down to a bug (thanks for the report). It turns out that the collection has two meanings. One is to locate dataIds and the other is to find the corresponding camera geometry. Currently the collection parameter only ends up being used for the latter and is completely absent from the dataId query in the command-line. This leaves the dataId query constrained only by instrument. We haven’t really noticed because firstly we put our raws all in a single collection per-instrument and secondly we haven’t implemented time-varying camera geometry (so it’s always picking up the default geometry for that instrument).

The issue with query-dimension-records is that the collections argument is ignored if the datasets argument is not given (it does explain that in the help).

This should be fixed in the next weekly (w_2021_34)