When using butler.registry.queryDatasets
, I’m getting multiple copies of some dataset refs, e.g., when asking for calexps for a given visit. Here’s some code to reproduce using the DP0.1 data:
! eups list lsst_distrib
repo = 's3://butler-us-central1-dp01'
butler = dafButler.Butler(repo)
collection = '2.2i/runs/DP0.1'
visit = 748908
raw_refs = list(registry.queryDatasets(datasetType='raw', visit=visit, collections=collection, findFirst=True))
print("raw dataset refs:", len(raw_refs))
calexp_refs = list(registry.queryDatasets(datasetType='calexp', visit=visit, collections=collection, findFirst=True))
print("calexp dataset refs:", len(calexp_refs))
print("unique dataset refs:", len(set(calexp_refs)))
and the output
21.0.0-3-gc37e2ab+2186fb90a2 w_2021_25 current setup
raw dataset refs: 189
calexp dataset refs: 404
unique dataset refs: 189
The raw data have the expected 189 refs for a full visit, but there are at least 3 copies of some of the associated calexp references. I see similar behavior for src
datatypes.
Why are there multiple copies? Is there a way to have the query return only one copy?