LOR: The DESC Time Domain Probes Photo-Z Use Case

The DESC Time Domain Probes Photo-Z Use Case

Contributors: Gautham Narayan, Alex Malz, Alex Gagliano for the LSST-DESC
Co-signers: Johann Cohen-Tanugi, Benjamin Rose, Alex Kim, Shahab Joudaki, Biprateep Dey for the LSST-DESC

0. Summary Statement

Photo-z estimation for transient host galaxies, and thus for the transients themselves, is a challenging and evolving problem that convolves the distinct issues of host-galaxy identification, association and transient classification. This in turn impacts whether a transient is selected for further analyses and follow up studies, and therefore any publication where detailed knowledge of the demographics of the sample is crucial, including the determination of rates, studies of transients and their host environment and cosmology. Each stage of this process (host association, photo-z estimation, transient classification, and the combination of the previous three products into a joint posterior for the transient type and redshift) can have uncertainties. Therefore, we focus this LOR on suggestions that will allow time domain analyses to detect and mitigate the impact from these uncertainties.

1. Scientific Utility

While the Science Collaborations (SCs) will have access to the Rubin PZ catalog, it is not clear what the latency of updating it will be. It is important for alert broker teams, with whom DESC and several other SCs will work, to be able to get access to photo-z data products for any transient alert. Ideally, wherever possible, this information would be included in the alert packet itself, rather than as a host-identifier that can be used to look up redshifts in an external catalog. That redshift information will be used to run trained classification algorithms (e.g. SuperNNova: Moller et al. 2019, RAPID: Muthukrishna et al. 2019, snmachine: Lochner et al. 2016, Alves et al. 2021, submitted) on the alert streams on these brokers’ platforms. All of these classification algorithms require or benefit significantly from the inclusion of redshift information.

DESC will use the resulting redshift-informed classifications to identify a stream of high-confidence SNIa candidates for follow-up observations, volumetric supernova rate analyses, sample studies, and cosmological inference. As followed up and classified SNe Ia will be used to model our contamination and determine a resulting bias, even if DESC uses its own photo-z data products for our final analysis, it will depend on the Rubin PZs as these are a priori information for classification and selection of targets for follow up.

As Rubin operations proceed, DESC will likely construct and update its own photo-z catalog after each Rubin data release. However, brokers will need photo-z information to classify alerts upstream from DESC from the start of survey operation; it is thus crucial that the Rubin PZ information is available from that time.

2. Outputs

We have a number of redshift-related needs for information in the alerts, from which we define needs of the Rubin DM PZ object table.

  1. Because a host photo-z implicitly assumes a host association for a transient, we request that alert brokers provide photo-z information for no fewer than the three nearest galaxies.[a]
  2. If a host galaxy’s spectroscopic redshift (including those provided through in-kind contributions) or photo-z from deeper or less uncertain photometry (such as DES) and/or additional photometric bands is available via an external catalog or service (for example, the transient name server), that information should be provided in the alert.[b]
  3. The alerts should include host galaxy ugrizy photometry/limits, uncertainties, and shape parameters, and, to the greatest extent possible, any other data that was used as input to the photo-z estimators should also be provided, preferably together with the alerts themselves, or through lookup using the host identifier or position for groups with data rights (see Sec. 4).

Based on these needs for redshift information in the alerts, our recommendations for the Rubin DM PZ object table are as follows:

Photo-z probability density functions: To better encapsulate the uncertainty landscape of photo-zs, we request full PDFs over point estimates, whether or not they are derived from PDFs. While photo-z likelihoods would be preferable for cosmological inference, we recognize that most existing estimators yield photo-z posteriors, and we believe posteriors, even without the knowledge of an implicit prior, will be sufficient for the purposes of DESC’s use of Rubin PZ data products. PDFs provided in a sparser representation than on a grid (e.g. quantiles as per Malz & Marshall et al 2018) are likely to be appropriate for DESC’s purposes.

Revised photo-z data products released, early and often: Particularly in the early years of operations when static source identification is evolving, photo-z data products should be recomputed whenever a template for a field is constructed and static source extraction is performed, ideally every few months as new regions of the WFD area are observed, to minimize delays in completeness of alerts with host redshift information.

Ancillary photometric metadata: Photo-z data products provided with the alert stream should contain identifiers for the field template (and/or Data Release ID) and any available flags indicating if there were any cosmetic issues with the data or with the reduction. For example, flags such as deblending confidence or core saturation in particular will help with photo-z outlier identification, eventually influencing DESC TD sample selection.

Provenance of photo-z data products: Metadata regarding the photo-z estimators corresponding to each photo-z data product should be made available, including not only the version of the algorithm and implementation but also an identifier for the prior information with which it was provided (e.g. the training set or template library and any hyperparameters set for a given estimation run) for the sake of reproducibility. If possible, the LSST SCs should have access to the containerized versions of the estimators of a given version and their prior information that Rubin DM runs to produce the photo-z data products. Additionally, as alerts are only issued after diaSource detection, after a transient fades below the detection threshold, it will cease to be updated. Therefore any host identifier provided with the alert will not be updated further. This therefore requires that host identifiers remain consistent between data releases and if a galaxy is segmented into sub-components in future data releases, the previous host identifier be used to lookup the parent.

3. Performance

Metrics such as “<X% uncertainty at redshift z” don’t correspond to all our use cases for probabilistic photo-zs. However, for follow-up decisions, we would recommend point estimators with low scatter and bias at z < 0.1, provided those metrics are obtained with realistically imperfect prior information, as low-redshift training sets and template libraries are more likely to suffer from incompleteness.

In terms of overall performance for DESC’s use of transient classifications influenced by Rubin photo-z data products, it is most important for the PDFs to accurately quantify the uncertainty on a potential host galaxy’s redshift, even if the uncertainty is large. Schmidt, Malz & Soo et al 2020 presents an effective null test of photo-z PDFs based on population level metrics of uncertainty quantification under unrealistically perfect prior information; we recommend that such metrics of galaxies at z > 0.6 be used to aggregate a preliminary shortlist of estimators prior to an analysis of their sensitivity to imperfect prior information.

4. Technical Aspects

Some of the recommendations of Sec. 2 may involve proprietary data covered by the Rubin data rights policy, in contrast to the host-galaxy identifiers, separations, and a photo-z estimate, so brokers and SCs will have to use the provided identifier to retrieve this information, as well as abide by RDO-013 and limit distribution of these data products. Nevertheless, some way to access this information will significantly impact early science with LSST data e.g. validating host photo-zs with an independent method and only selecting those transients for which both methods are in agreement. The need to access proprietary data therefore requires either a private alert with this ancillary information, or a system that can handle bulk requests by galaxy identifier, or for cone searches with very low latency from brokers as well as science collaborations for some fraction of all alerts. Such a system may prove challenging to implement. With the recognizance that it will be impractical for groups without data rights to reconstruct the full data release galaxy photometry from the limited hosts broadcast with alerts, it may be simpler for broker teams, SCs and the LSST project itself to simply include preliminary photometry for host-galaxies of transients with the alerts if this is feasible under the data rights policy.

Additionally, we recommend pre-operations testing of the alerts infrastructure with broker teams and the SCs, including the use of photo-zs. Example alerts with photo-z information would be valuable together with associated schema and documentation, and testing at scale of algorithms and the alert production framework with the approved broker teams as well as the science collaborations would be tremendously helpful. (See comment thread below for further discussion beyond the scope of this LOR.)

[a] This choice of at least three galaxies is reasonable based on our experience with the ZTF public alert stream, but may need to be revisited in later years of survey operation, as crowding in the stacked static sky catalog increases. If galaxy shape information is available prior to association, we recommend using the directional light radius (Gupta et al., 2016) rather than simple angular separation, which can fail at low-z when transients can be at large angular separation from their hosts. We also recommend providing an uncertainty on the host association itself, following the Probabilistic Association of Transients to their Hosts (PATH) method, described in Aggarwal et al., 2021. This uncertainty is particularly crucial for any transients followed by Rubin Observatory during Target-of-opportunity observations in response to gravitational wave alerts, or other triggers resulting from observations where the transient itself is not well localized. This is strictly a requirement on alert production, rather than the DM Photo-z Object catalog, but it requires shape information be present in the DM Photo-z Object catalog.
[b] This will be particularly critical during the first years of operations when the stacked depth is still below the signal-to-noise floor for well-characterized photometric redshift estimates, as estimators are rarely validated in the noise-dominated regime.

Additionally, we’d like to add that DESC is prepared to commit to work with Rubin Alerts Production (AP) and the PZ VC to prepare the necessary example data products as described in this LOR for this effort as part of the Extended LSST Astronomical Time Series Classification Challenge (ELAsTiCC). ELAsTiCC is the follow-up to the original PLAsTiCC challenge, which used point-estimates for photo-zs using the CMNN PZ algorithm, kindly provided by Dr. Melissa Graham. An updated dataset and challenge would be a service to the LSST Science Community, and DESC would welcome the opportunity to work with LSST AP to help accomplish this, while not impacting AP’s own work plans and timelines.

1 Like