In the past week or so there have been some major changes to how we deal with logging in the science pipelines software.
Following RFC-782 the logger directly attached to a
Task is now a Python logger and no longer a
lsst.log.Log logger. The actual class that is used is derived from a
logging.LoggerAdapter and has come compatibility methods available. In particular:
log.trace()can be used as with
- A new method
log.verbose()has been added that can issue log messages between
INFO. This can be used when you want to give more information without swamping a user with debugging information. (see RFC-245 for more discussion on this).
- The debugging constants are available so
task.log.INFOstill works as it did before.
- There is a context manager attached to a logger (
temporary_log_level) that allows the level to be temporarily modified. This should be used instead of
This transition has come with some deprecations and you may well see additional warnings coming out of the code:
log.getLevel() should now use the normal
.level properties (these were added to
lsst.log.Log last week to simplify the transition).
log.isDebugEnabled should be replaced by
log.warn() and so all code should transition to
log.warning(). We also now deprecate the
warningf() and related methods.
One immediately visible effect of this change is that log messages are now captured by
pytest automatically and reported separately from standard error and standard output.
Task now uses python logging, CmdLineTask is still configured to forward all python log messages to
lsst.log. Gen2 packages such as
daf_persistence and the derived Mapper classes are still using
lsst.log and there are no plans to touch those.
For anything not related to Gen2, the guidance is to now use Python logging by default in python code. If you want to make use of
verbose outside of a
Task you can use
lsst.pipe.base.getTaskLogger() to create the logger.
pipetask command lines all log output is now handled by Python logging.
There are two new command line options:
--log-filecan be used to record all log messages to a file. If the file has extension
.jsonthe log file will contain JSON records, else it will contain formatted log strings. This option can be used multiple times. The log strings will match whatever format was being used and respect
--long-logwhereas the JSON records will contain all the information.
--no-log-ttycan be used to disable log messages sent to the terminal. This is particularly useful if you are sending output to a separate file (and you can always use
tail -fon that file to see what’s going on).
One major new change is that
pipetask run now records log output from each quantum in the butler itself along with task metadata. For example from
>>> from lsst.daf.butler import Butler >>> butler = Butler("DATA_REPO") >>> log = butler.get("isr_log", instrument="HSC", detector="1_36", exposure=903342, collections="demo_collection") >>> type(log) <class 'lsst.daf.butler.core.logging.ButlerLogRecords'> >>> print(log[:5]) WARNING 2021-07-28T16:40:22.463971+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE WARNING 2021-07-28T16:40:24.793796+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE WARNING 2021-07-28T16:40:24.929224+00:00 isr isrTask.py:1003 - Constructing linearizer from cameraGeom information. DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157 INFO 2021-07-28T16:40:24.929917+00:00 isr isrTask.py:1410 - Converting exposure to floating point values.
lsst.daf.butler.ButlerLogRecords class acts like a collection of log records and can be indexed. The format string can also be changed in order to display different information when the records are printed.
We don’t provide any methods for filtering the records once they have been retrieved although it’s relatively straightforward to implement your own filter:
>>> filtered = type(log).from_records([r for r in log if r.levelname == "DEBUG"]) >>> len(filtered) 4 >>> print(filtered) DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157 DEBUG 2021-07-28T16:40:36.993596+00:00 timer.isr.fringe timer.py:73 - runStartCpuTime=20.013078; runStartUserTime=17.349074; runStartSystemTime=2.6640129999999997; runStartMaxResidentSetSize=2110672896; runStartMinorPageFaults=517525; runStartMajorPageFaults=2218; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3053; runStartInvoluntaryContextSwitches=14394 DEBUG 2021-07-28T16:40:36.994521+00:00 timer.isr.fringe timer.py:73 - runEndCpuTime=20.036667; runEndUserTime=17.369633; runEndSystemTime=2.667043; runEndMaxResidentSetSize=2110672896; runEndMinorPageFaults=517525; runEndMajorPageFaults=2218; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=14459 DEBUG 2021-07-28T16:40:38.225491+00:00 timer.isr timer.py:73 - runEndCpuTime=21.259907; runEndUserTime=18.539821; runEndSystemTime=2.720103; runEndMaxResidentSetSize=2132619264; runEndMinorPageFaults=522835; runEndMajorPageFaults=2267; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=15114
DEBUG is enabled the logs can get very large. For example in
characterizeImage turning on
DEBUG results in half a million log records and about 500MB. These can be retrieved although if this becomes common we may need to add filtering parameters to the
The expectation is that we will start using these new features to harvest and store logs in a centralized LogStash-like environment to allow easy search and display.