In the past week or so there have been some major changes to how we deal with logging in the science pipelines software.
pipe.base.Task Logging
Following RFC-782 the logger directly attached to a Task
is now a Python logger and no longer a lsst.log.Log
logger. The actual class that is used is derived from a logging.LoggerAdapter
and has come compatibility methods available. In particular:
-
log.trace()
can be used as withlsst.log
. - A new method
log.verbose()
has been added that can issue log messages betweenDEBUG
andINFO
. This can be used when you want to give more information without swamping a user with debugging information. (see RFC-245 for more discussion on this). - The debugging constants are available so
task.log.INFO
still works as it did before. - There is a context manager attached to a logger (
temporary_log_level
) that allows the level to be temporarily modified. This should be used instead oflsst.log.utils.temporaryLogLevel
.
This transition has come with some deprecations and you may well see additional warnings coming out of the code: log.getName()
, log.getLevel()
should now use the normal .name
and .level
properties (these were added to lsst.log.Log
last week to simplify the transition). log.isDebugEnabled
should be replaced by log.isEnabledFor
. Python logging
deprecated log.warn()
and so all code should transition to log.warning()
. We also now deprecate the infof()
, warningf()
and related methods.
One immediately visible effect of this change is that log messages are now captured by pytest
automatically and reported separately from standard error and standard output.
Gen2 Log Output
Even though Task
now uses python logging, CmdLineTask is still configured to forward all python log messages to lsst.log
. Gen2 packages such as daf_persistence
and the derived Mapper classes are still using lsst.log
and there are no plans to touch those.
Adding logging to python code
For anything not related to Gen2, the guidance is to now use Python logging by default in python code. If you want to make use of trace
and verbose
outside of a Task
you can use lsst.pipe.base.getTaskLogger()
to create the logger.
Gen 3 log output
For butler
and pipetask
command lines all log output is now handled by Python logging.
There are two new command line options:
-
--log-file
can be used to record all log messages to a file. If the file has extension.json
the log file will contain JSON records, else it will contain formatted log strings. This option can be used multiple times. The log strings will match whatever format was being used and respect--long-log
whereas the JSON records will contain all the information. -
--no-log-tty
can be used to disable log messages sent to the terminal. This is particularly useful if you are sending output to a separate file (and you can always usetail -f
on that file to see what’s going on).
One major new change is that pipetask run
now records log output from each quantum in the butler itself along with task metadata. For example from pipelines_check
output:
>>> from lsst.daf.butler import Butler
>>> butler = Butler("DATA_REPO")
>>> log = butler.get("isr_log", instrument="HSC", detector="1_36", exposure=903342, collections="demo_collection")
>>> type(log)
<class 'lsst.daf.butler.core.logging.ButlerLogRecords'>
>>> print(log[:5])
WARNING 2021-07-28T16:40:22.463971+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE
WARNING 2021-07-28T16:40:24.793796+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE
WARNING 2021-07-28T16:40:24.929224+00:00 isr isrTask.py:1003 - Constructing linearizer from cameraGeom information.
DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157
INFO 2021-07-28T16:40:24.929917+00:00 isr isrTask.py:1410 - Converting exposure to floating point values.
The lsst.daf.butler.ButlerLogRecords
class acts like a collection of log records and can be indexed. The format string can also be changed in order to display different information when the records are printed.
We don’t provide any methods for filtering the records once they have been retrieved although it’s relatively straightforward to implement your own filter:
>>> filtered = type(log).from_records([r for r in log if r.levelname == "DEBUG"])
>>> len(filtered)
4
>>> print(filtered)
DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157
DEBUG 2021-07-28T16:40:36.993596+00:00 timer.isr.fringe timer.py:73 - runStartCpuTime=20.013078; runStartUserTime=17.349074; runStartSystemTime=2.6640129999999997; runStartMaxResidentSetSize=2110672896; runStartMinorPageFaults=517525; runStartMajorPageFaults=2218; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3053; runStartInvoluntaryContextSwitches=14394
DEBUG 2021-07-28T16:40:36.994521+00:00 timer.isr.fringe timer.py:73 - runEndCpuTime=20.036667; runEndUserTime=17.369633; runEndSystemTime=2.667043; runEndMaxResidentSetSize=2110672896; runEndMinorPageFaults=517525; runEndMajorPageFaults=2218; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=14459
DEBUG 2021-07-28T16:40:38.225491+00:00 timer.isr timer.py:73 - runEndCpuTime=21.259907; runEndUserTime=18.539821; runEndSystemTime=2.720103; runEndMaxResidentSetSize=2132619264; runEndMinorPageFaults=522835; runEndMajorPageFaults=2267; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=15114
If DEBUG
is enabled the logs can get very large. For example in characterizeImage
turning on DEBUG
results in half a million log records and about 500MB. These can be retrieved although if this becomes common we may need to add filtering parameters to the butler.get()
call.
The expectation is that we will start using these new features to harvest and store logs in a centralized LogStash-like environment to allow easy search and display.