Recreating the LSST Science pipeline tutorial (gen 2) only using Generation 3 command line tasks and the pipetasks

timj · September 30, 2021, 11:04pm

I probably shouldn’t reply until you’ve read the paper since that might explain things better.

The butler is two things. It’s a file store with code that knows how to convert between python objects and files on disk. It’s a “registry” (aka database) that records the scientific relationships between all these files. The dataIds, collection names, and dataset types are all registry concepts and allow people to find their datasets without having to know where datastore wrote them or what format they are in.

This has nothing to do with the content of FITS files. Some dataset types might correspond to catalogs of sources or objects but butler doesn’t know anything about it. The fact that FITS files have some binary tables in them is not known to butler. It’s known to the Formatter class that has to read and write that file but that’s it. There is a clean distinction between butler managing the flow of data and what is in the data files.

“pipeline repository” is not a term we used. The butler repository stores the outputs from pipeline execution and allows pipelines to read in datasets from it.

fsklich · October 2, 2021, 2:42pm

okay, while I’m comfortable that I understood most all of what you said, I realize that my goal of actually finding the thousands of source stars/objects from my pipeline run is totally up to me. I can find the source fits files referenced in the SCHEMA table named FILE_DATASTORE_RECORDS, then I can use TAP to actually read the BinTableHDU to list the RA/DEC and other info about those sources… I should/will not find this detail in the Butler registry/database/repository schema tables.
thanks for your help. I’ll experiment with the astro_metadata_translator in jupyter just so I know it’s there.

fsklich · October 2, 2021, 2:53pm

And, per my original question:
The sqlite3/schema side is part of the Butler repository, the SCHEMA side.
and
As the fits files are referenced from the SCHEMA side, pointing to the files that contain the detailed BinTableHDU units, I regard this as also part of the repository. … even though it’s consumption is fully up to the science community.
Have a good Saturday. Wx in Dallas is getting cooler. Hope the Cardinals take care of the Rams tomorrow.

timj · October 2, 2021, 9:38pm

Yes, you can, but that’s not part of the public API. You should be using butler query-datasets command or Butler.registry.queryDatasets API to find the datasets (and hence files) that you want.

No. You can butler.get the dataset and read it in as a Python table object (a SourceCatalog or somesuch which you can convert to an Astropy Table if you wish) but you can’t use TAP. TAP is the service we put in front of a database server where we have ingested the source catalogs after processing. That’s not a butler functionality.

fsklich · October 3, 2021, 12:58pm

Great, Tim…all good stuff. Sorry I misused TAP term; musta been regressing to prior work with Charbourg portals. Yes, with your explanation I can see the more robust Butler picture…
There was a moment or two when I thought I might be able to use the --where predication to find all the sources, like --datasets ‘source’ --where "physical_filter = ‘HSC-R’ AND detector = ‘0_16’ AND instrument = ‘HSC’
Attached was my initial work from several weeks ago, trying to auger into the pipeline source results.
Many thanks Tim.

fsklich · October 3, 2021, 1:03pm

I spoke too soon. What I was trying to do, was in the NEXT STEP of part 3…sorry. Just as you said, I can use the butler.get to accomplish what I needed…

rfahed · October 4, 2021, 1:31pm

Hello ! I am trying to run the tutorial but I am stuck at the very beginning when loading the Butler (i.e. butler = Butler(repo_path))

With the following error :
/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2021_33/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/daf_butler/22.0.1-56-g65dbfa54+f253ffa91f/python/lsst/daf/butler/registry/databases/sqlite.py in __init__(self, engine, origin, namespace, writeable)
156 with engine.connect() as connection:
157 with closing(connection.connection.cursor()) as cursor:
--> 158 dbList = list(cursor.execute("PRAGMA database_list").fetchall())
159 if len(dbList) == 0:
160 raise RuntimeError("No database in connection.")

DatabaseError: file is not a database

Am I the only one with this problem ?
I am working with weekly “w_2021_33” deployed at CC-IN2P3

timj · October 4, 2021, 2:48pm

Assuming you are following the gen3 tutorial, it’s possible that you do not have git-lfs configured on your system and so you have not downloaded the real data files. You can tell by looking at some of them and seeing if they are just placeholders. (and also by seeing if your butler repo is very small because it doesn’t include any of the fits files)

rfahed · October 4, 2021, 3:18pm

It was indeed the problem. I had to do a git-lfs install and a git-lfs pull in the rc2_subset/ repo first. Thank you !

KSK · October 6, 2021, 10:56pm

I’m afraid I don’t see the extra curly brace, can you give me some context like maybe the few preceding lines? Thanks very much for pointing out the typos. It makes the end product that much better.

fsklich · October 7, 2021, 12:03pm

Simon,
thanks, perhaps it was my error. Here is where I encountered the missing “f” and the purported extra “}”.
Maybe it was my pasting:
++++++++++++++++++++++++++++++++++

import os
collection = “u/{os.environ[‘USER’]}/single_frame”
src = butler.get(‘src’, visit=23718, detector=41}, collections=collection, instrument=‘HSC’)
File “”, line 1
src = butler.get(‘src’, visit=23718, detector=41}, collections=collection, instrument=‘HSC’)
^
SyntaxError: closing parenthesis ‘}’ does not match opening parenthesis ‘(’

print collection
File “”, line 1
print collection
^
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(collection)?

print(collection)
u/{os.environ[‘USER’]}/single_frame
++++++++++++++++++++++++++++++++++++
I’ve refreshed the tutorial page and don’t see it there now…
FSK

fsklich · October 7, 2021, 12:09pm

Tim and Co.:
I wish to rerun the Step 4 FGCM and generate more detailed logs.
While the first time it seems to run just fine, do I need to specify an different output parm, or any other suggestions/warnings you have would be appreciated.
Thanks

timj · October 7, 2021, 2:48pm

That curly brace shouldn’t be there.

You can’t write new outputs with identical dataIds to a run that already has those dataIds. You need to specify a new output collection for them.

fsklich · October 7, 2021, 9:23pm

Simon,
thanks, perhaps it was my error. Here is where I encountered the missing “f” and the purported extra “}”.
Maybe it was my pasting:
++++++++++++++++++++++++++++++++++

import os
collection = “u/{os.environ[‘USER’]}/single_frame”
src = butler.get(‘src’, visit=23718, detector=41}, collections=collection, instrument=‘HSC’)
File “”, line 1
src = butler.get(‘src’, visit=23718, detector=41}, collections=collection, instrument=‘HSC’)
^
SyntaxError: closing parenthesis ‘}’ does not match opening parenthesis ‘(’
print collection
File “”, line 1
print collection
^
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(collection)?
print(collection)
u/{os.environ[‘USER’]}/single_frame
++++++++++++++++++++++++++++++++++++
I’ve refreshed the tutorial page and don’t see it there now…
FSK

fsklich · October 7, 2021, 9:54pm

Good Tim. The braces comment was meant for Simon.
So, I’ll plan to use --log-level DEBUG --long-log -o u/$USER/fgcm2 .
Is there some documentation on what is meant by the “COMPONENT=???” feature for logging? Is it the COMPONENT name as defined in the DRP.yaml.

timj · October 7, 2021, 10:35pm

I believe that was fixed already.

It’s referring to the logger name. If you want to get detailed output from the butler datastore but everything else be quiet you can use --log-level lsst.daf.butler.datastores=DEBUG.

You can see the logger name when you use --long-log.

fsklich · October 14, 2021, 4:27pm

This question may be OOS for community forum.
I’m still digging into the class files for gen3, specifically the fgcmFitCycle.py [under /fgcmcal/ ] and same [under /fgcm/] in the lsst code tree.
I work to cut/paste sections of the code into my local Jupyter notebook, with full gen3 lsst environment set.
When I attempt either of these relative imports:

from .fgcmParameters import FgcmParameters
from .fgcmChisq import FgcmChisq
from .fgcmStars import FgcmStars
from .fgcmLUT import FgcmLUT
or these:
from .utilities import makeConfigDict, translateFgcmLut, translateVisitCatalog
from .utilities import extractReferenceMags
from .utilities import computeCcdOffsets, makeZptSchema, makeZptCat
from .utilities import makeAtmSchema, makeAtmCat, makeStdSchema, makeStdCat
I get the infamous error:
ImportError: attempted relative import with no known parent package
Even from a native python prompt in a terminal window, receive same result.
Is this a runtime lsst environment [$PATH] issue or something simple to resolve?
I like exploring snippets of code by locally running in my Jupyter notebook environment.
Again, if this is out-of-bounds, I’ll understand.

KSK · October 14, 2021, 5:41pm

Is there a reason you aren’t doing the full namespace import to avoid the relative imports?

E.g.

from fgcm.fgcmParameters import FgcmParameters
from lsst.fgcmcal.utilities import makeConfigDict

fsklich · October 14, 2021, 11:03pm

Well, I was just blindly copying the code from Visual Studio editor into Jupyter cells. See screenshot.
Believe me, I would rather do the full namespace. Thanks for the alternative. I may follow up with another question, but this surely helps.
Fred klich, Dallas, Tx

fsklich · October 16, 2021, 12:03pm

Simon,
I got through this by using your suggestion. For years I’ve always been fuzzy about relative imports, but your suggestion perhaps helped me understand it a bit better…
Have a good weekend.
Fred, Dallas