In the past week or so there have been some major changes to how we deal with logging in the science pipelines software.
pipe.base.Task Logging
Following RFC-782 the logger directly attached to a Task is now a Python logger and no longer a lsst.log.Log logger. The actual class that is used is derived from a logging.LoggerAdapter and has come compatibility methods available. In particular:
-
log.trace()can be used as withlsst.log. - A new method
log.verbose()has been added that can issue log messages betweenDEBUGandINFO. This can be used when you want to give more information without swamping a user with debugging information. (see RFC-245 for more discussion on this). - The debugging constants are available so
task.log.INFOstill works as it did before. - There is a context manager attached to a logger (
temporary_log_level) that allows the level to be temporarily modified. This should be used instead oflsst.log.utils.temporaryLogLevel.
This transition has come with some deprecations and you may well see additional warnings coming out of the code: log.getName(), log.getLevel() should now use the normal .name and .level properties (these were added to lsst.log.Log last week to simplify the transition). log.isDebugEnabled should be replaced by log.isEnabledFor. Python logging deprecated log.warn() and so all code should transition to log.warning(). We also now deprecate the infof(), warningf() and related methods.
One immediately visible effect of this change is that log messages are now captured by pytest automatically and reported separately from standard error and standard output.
Gen2 Log Output
Even though Task now uses python logging, CmdLineTask is still configured to forward all python log messages to lsst.log. Gen2 packages such as daf_persistence and the derived Mapper classes are still using lsst.log and there are no plans to touch those.
Adding logging to python code
For anything not related to Gen2, the guidance is to now use Python logging by default in python code. If you want to make use of trace and verbose outside of a Task you can use lsst.pipe.base.getTaskLogger() to create the logger.
Gen 3 log output
For butler and pipetask command lines all log output is now handled by Python logging.
There are two new command line options:
-
--log-filecan be used to record all log messages to a file. If the file has extension.jsonthe log file will contain JSON records, else it will contain formatted log strings. This option can be used multiple times. The log strings will match whatever format was being used and respect--long-logwhereas the JSON records will contain all the information. -
--no-log-ttycan be used to disable log messages sent to the terminal. This is particularly useful if you are sending output to a separate file (and you can always usetail -fon that file to see what’s going on).
One major new change is that pipetask run now records log output from each quantum in the butler itself along with task metadata. For example from pipelines_check output:
>>> from lsst.daf.butler import Butler
>>> butler = Butler("DATA_REPO")
>>> log = butler.get("isr_log", instrument="HSC", detector="1_36", exposure=903342, collections="demo_collection")
>>> type(log)
<class 'lsst.daf.butler.core.logging.ButlerLogRecords'>
>>> print(log[:5])
WARNING 2021-07-28T16:40:22.463971+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE
WARNING 2021-07-28T16:40:24.793796+00:00 afw.image.MaskedImageFitsReader MaskedImageFitsReader.cc:94 - Expected extension type not found: IMAGE
WARNING 2021-07-28T16:40:24.929224+00:00 isr isrTask.py:1003 - Constructing linearizer from cameraGeom information.
DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157
INFO 2021-07-28T16:40:24.929917+00:00 isr isrTask.py:1410 - Converting exposure to floating point values.
The lsst.daf.butler.ButlerLogRecords class acts like a collection of log records and can be indexed. The format string can also be changed in order to display different information when the records are printed.
We don’t provide any methods for filtering the records once they have been retrieved although it’s relatively straightforward to implement your own filter:
>>> filtered = type(log).from_records([r for r in log if r.levelname == "DEBUG"])
>>> len(filtered)
4
>>> print(filtered)
DEBUG 2021-07-28T16:40:24.929631+00:00 timer.isr timer.py:73 - runStartCpuTime=8.120303; runStartUserTime=5.977052; runStartSystemTime=2.143258; runStartMaxResidentSetSize=781852672; runStartMinorPageFaults=193407; runStartMajorPageFaults=1910; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3050; runStartInvoluntaryContextSwitches=5157
DEBUG 2021-07-28T16:40:36.993596+00:00 timer.isr.fringe timer.py:73 - runStartCpuTime=20.013078; runStartUserTime=17.349074; runStartSystemTime=2.6640129999999997; runStartMaxResidentSetSize=2110672896; runStartMinorPageFaults=517525; runStartMajorPageFaults=2218; runStartBlockInputs=0; runStartBlockOutputs=0; runStartVoluntaryContextSwitches=3053; runStartInvoluntaryContextSwitches=14394
DEBUG 2021-07-28T16:40:36.994521+00:00 timer.isr.fringe timer.py:73 - runEndCpuTime=20.036667; runEndUserTime=17.369633; runEndSystemTime=2.667043; runEndMaxResidentSetSize=2110672896; runEndMinorPageFaults=517525; runEndMajorPageFaults=2218; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=14459
DEBUG 2021-07-28T16:40:38.225491+00:00 timer.isr timer.py:73 - runEndCpuTime=21.259907; runEndUserTime=18.539821; runEndSystemTime=2.720103; runEndMaxResidentSetSize=2132619264; runEndMinorPageFaults=522835; runEndMajorPageFaults=2267; runEndBlockInputs=0; runEndBlockOutputs=0; runEndVoluntaryContextSwitches=3053; runEndInvoluntaryContextSwitches=15114
If DEBUG is enabled the logs can get very large. For example in characterizeImage turning on DEBUG results in half a million log records and about 500MB. These can be retrieved although if this becomes common we may need to add filtering parameters to the butler.get() call.
The expectation is that we will start using these new features to harvest and store logs in a centralized LogStash-like environment to allow easy search and display.