I am running into issues/unexpected/confusing behavior with the butler define-visits
CLI command. It seems that the command attempts to create visits for all exposures in the repository regardless of what is passed with the --collections
argument. For my use-case, I’ve ingested all object frames into separate collections based on their night of observation and the target they are pointing to. Here is a sample of my collections:
$ butler query-collections repo | grep "DECam/raw/object/210318" | grep "RUN"
DECam/raw/object/210318 RUN
DECam/raw/object/210318/cosmos_1 RUN
DECam/raw/object/210318/cosmos_2 RUN
DECam/raw/object/210318/cosmos_3 RUN
I’d like it if I could define visits on a per-night, per-target basis, so for example by running:
$ butler define-visits repo lsst.obs.decam.DarkEnergyCamera --collections DECam/raw/object/210318/cosmos_3 -j 48
However, I’m finding that running these commands takes a very long time (minutes, even with -j 48
). I’ve noticed the following identical outputs when running this command with different collections:
defineVisits INFO: Preprocessing data IDs.
defineVisits INFO: Registering visit_system 0: one-to-one.
defineVisits INFO: Grouping 1321 exposure(s) into visits.
defineVisits INFO: Computing regions and other metadata for 1321 visit(s).
so it looks like the define visits task is gathering all of the data in the repository regardless of the collection names passed. I can also run the define visits command with a non-existing collection and receive the same output before the command fails since the collection does not exist:
$ butler define-visits repo/ lsst.obs.decam.DarkEnergyCamera --collections DECam/raw/object/210308/cosmos_1 -j 48
defineVisits INFO: Preprocessing data IDs.
defineVisits INFO: Registering visit_system 0: one-to-one.
defineVisits INFO: Grouping 1321 exposure(s) into visits.
defineVisits INFO: Computing regions and other metadata for 1321 visit(s).
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 469, in _buildVisitRecordsSingle
return self._buildVisitRecords(args[0], collections=args[1])
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 363, in _buildVisitRecords
visitRegion, visitDetectorRegions = self.computeVisitRegions.compute(definition,
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 789, in compute
detectorBounds = self.computeExposureBounds(visit.exposures[0], collections=collections)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 729, in computeExposureBounds
camera, versioned = loadCamera(self.butler, exposure.dataId, collections=collections)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/_instrument.py", line 911, in loadCamera
cameraRef = butler.get("camera", dataId=dataId, collections=collections)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py", line 1008, in get
ref = self._findDatasetRef(datasetRefOrType, dataId, collections=collections, **kwds)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py", line 787, in _findDatasetRef
ref = self.registry.findDataset(datasetType, dataId, collections=collections, timespan=timespan)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/_sqlRegistry.py", line 405, in findDataset
for collectionRecord in collections.iter(self._managers.collections):
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/wildcards.py", line 469, in iter
manager.find(name),
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/collections/_base.py", line 416, in find
raise MissingCollectionError(f"No collection with name '{name}' found.")
lsst.daf.butler.registry._exceptions.MissingCollectionError: No collection with name 'DECam/raw/object/210308/cosmos_1' found.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/bin/butler", line 28, in <module>
sys.exit(main())
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/cli/butler.py", line 321, in main
return cli()
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/cli/cmd/commands.py", line 94, in define_visits
script.defineVisits(*args, **kwargs)
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/script/defineVisits.py", line 61, in defineVisits
task.run(butler.registry.queryDataIds(["exposure"], dataId={"instrument": instr.getName()}),
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/obs_base/21.0.0-50-gfd878b2+50d1e73eb4/python/lsst/obs/base/defineVisits.py", line 550, in run
for visitRecords in self.progress.wrap(allRecords, total=len(definitions),
File "/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/core/progress.py", line 246, in wrap
yield from bar
File "/astro/users/stevengs/.conda/envs/lsst-scipipe-0.5.0/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
lsst.daf.butler.registry._exceptions.MissingCollectionError: No collection with name 'DECam/raw/object/210308/cosmos_1' found.
which tells me that all of the data ids in the repository are being gathered before the collection is referenced.
I’ve also found that the butler query-dimension-records
command similarly ignores the --collections
argument. For example:
butler query-dimension-records repo/ visit --collections DECam/raw/object/210318/cosmos_1
will give me visits defined from all of the exposures ingested, not just those in the collection.
I am guessing that define-visits
and query-dimesion-records
both work in a way I just don’t expect, gathering data ids from the repository independently of the collections passed, so perhaps no arguments are truly being “ignored”, I just don’t know how to get the behavior I want.
I’d really like it if I could define visits in a quick and incremental manner, i.e. using single collections, since I will be ingesting and processing more data over time. Is there a way for me to do this?