LOR: PZFlow as PZ Estimator

jbkalmbach · September 30, 2021, 7:49pm

Title: LOR for PZFlow as PZ Estimator

Contributors: Andrew J. Connolly (1), John Franklin Crenshaw (1), J. Bryce Kalmbach (1)

(1) Astronomy Department and DIRAC Institute, University of Washington

Co-signers: Alex Malz

0. Summary Statement

This letter of recommendation describes the PZFlow code (Crenshaw and Doster 2021) and its use as a photometric redshift estimator. PZFlow is a code developed to train normalizing flows (Jimenez Rezende and Mohamed 2015, Dinh et al. 2015) to learn the joint probability distribution of training data sets. Once the normalizing flow is properly trained it can calculate posteriors over any of the variables in the data. This means that given a training set of photometry and spectroscopic redshifts it can quickly produce probability density functions for the photometric redshifts of all LSST Objects. It is currently a key part of the development of the LSST DESC Photo-Z Working Group RAIL project that seeks to understand how systematic errors in photo-z training will impact LSST photo-z estimates and LSST DESC cosmology.

1. Scientific Utility

The PZFlow code would produce photo-z estimates not related to a specific science goal and would be broadly applicable across all photo-z scientific needs. PZFlow would produce full posterior probability density functions (PDFs) rather than simple point estimates. When making estimates the code can take into account the photometric uncertainty of the observations. PZFlow can accept a training set with non-optical photometry included and marginalize over any missing bands in the observational dataset.

In initial work with the LSST DESC RAIL project PZFlow has been used to create samples of photometric catalogs from training data. PZFlow can successfully approximate the probability distribution of the training catalogs. The PZFlow code itself is useful across a range of scientific contexts and was used recently in Malz et al. 2021 to calculate a new metric for survey strategy optimization in terms of the photo-z.

2. Outputs

PZFlow can output a full PDF for redshift based upon input values for photometry plus photometric errors. By convolving over the photometric error distributions, PZFlow can incorporate errors during training and can produce photo-z posteriors that are fully consistent with the errors. Furthermore, PZFlow calculates the final PDF from a deep ensemble of normalizing flows that allows us to account for epistemic uncertainty in the model. This produces a more accurate photo-z posterior, and can even produce a distribution over photo-z posteriors if desired.

A key advantage in PZFlow is the ability to function even when a galaxy is missing photometry in one or more bands by marginalizing over the probability distribution in the missing bandpasses. This aspect of PZFlow may be most useful early in the LSST survey when sky coverage in one or more bands is incomplete for some galaxies.

3. Performance

A full comparison of PZFlows’s photo-z performance is currently underway as part of an upcoming LSST DESC paper to be released on PZFlow and its use in the RAIL project. However, this initial analysis will be on simulated data and we recommend PZFlow to be included on the shortlist of testing with commissioning data during the Photo-z Validation Cooperative.

4. Technical Aspects

Scalability - Will meet.

PZFlow is written in Jax, automatically enabling computation on GPUs and TPUs, as well as parallel vectorization. Preliminary tests suggest that PZFlow is a very fast photo-z estimator.

Inputs and Outputs - Will meet.

Once trained, PZFlow will only require catalog photometry and errors. Unlike some machine learning codes PZFlow does not require photometry in all bands to produce outputs and does not require a placeholder or other value in the place of missing bands due to the ability to marginalize over the missing bands when calculating a PDF estimate. PDFs are calculated on a user-specified grid of redshift values, but can be easily converted to other formats (e.g. a quantile parameterization) using the qp package.

Storage Constraints - Will meet.

PZFlow is storage agnostic. We anticipate that PDFs generated by PZFlow will be converted and stored in the most convenient format using the the qp package.

External Data Sets - Will meet.

PZFlow will require a spectroscopic training set (photometry and accompanying spectroscopic redshifts) but no additional needs over other machine learning photo-z algorithms.

Estimator Training and Iterative Development - Will probably meet.

Validation of the performance of PZFlow will be necessary under different training set systematics. We will submit PZFlow as a test estimator in the LSST DESC RAIL project that seeks to evaluate the performance of photo-z estimators under varying scenarios of training set incompleteness and systematic error.

Computational Processing Constraints - Will meet.

After training the photo-z estimation process with PZFlow requires low computational overhead Training and estimation time benefit from the ability to use Graphics Processor Units (GPUs) but the code can run on CPUs as well.

Implementation Language - Will meet.

PZflow is written in Python 3 and pip installable. The code is open source and maintained on Github.