Creating and using new style reference catalogs

KSK · January 4, 2017, 2:47am

Background

With the adoption of RFC-257 and the implementation of DM-8232 it is now possible to feed two different reference catalogs to the single frame measurement task (processCcd.py). Because of the way a.net index files are located, it is not possible to use this feature with them. Instead, you will need to use the new HTM indexed file format.

Creating HTM catalogs

Since the ingestion script is a command line task, you will need a data repository to ingest into. This means that you will need, at a minimum, a directory and a _mapper. I tend to use the mapper in obs_test, but if you have specific ingest overrides, you can use a particular obs package.

$> mkdir my_ref_repo
$> echo "lsst.obs.test.TestMapper" > my_ref_repo/_mapper

For the purpose of demonstration, I’ve made a fake reference catalog called simple_ref.txt.

mra, mdec, my_id, a, a_err, b_err, b
1., 2., 1, 20., .2, .3, 21.
2., 3., 2, 19., .1, .25, 22.

The required columns are RA, Dec, at least one magnitude, and id. Since these are required, but no naming convention is enforced, you need a minimal config to do the ingestion. Following is an example config for this reference catalog. Note that this is slightly more than minimal.

# String to pass to the butler to retrieve persisted files.
config.dataset_config.ref_dataset_name='my_super_special_reference_catalog'

# Name of RA column
config.ra_name='mra'

# Name of Dec column
config.dec_name='mdec'

# Name of column to use as an identifier (optional).
config.id_name='my_id'

# The values in the reference catalog are assumed to be in AB magnitudes. List of column names to use for
# photometric information.  At least one entry is required.
config.mag_column_list=['a', 'b']

# A map of magnitude column name (key) to magnitude error column (value).
config.mag_err_column_map={'a':'a_err', 'b':'b_err'}

At this point ingestion is as simple as:

$> ingestReferenceCatalog.py my_ref_repo/ simple_ref.txt --configfile my_ref.cfg

###Using HTM ref catalogs
If you ingested into a data repository with your data in it, you need do nothing more than specify the name of the catalog on the command line. I.e.

$> processCcd.py my_repo --id --config calibrate.photoRefObjLoader.ref_dataset_name='catalog_1' calibrate.astromRefObjLoader.ref_dataset_name='catalog_2'

Where catalog_1 and catalog_2 are the names of the catalogs specified in the config at ingest time. There is also a standard set of reference catalogs available from /datasets/refcats/htm/htm_baseline. You can link that directory into your data repo and have access to any of the baseline reference catalogs available there.

$> ln -s /datasets/refcats/htm/htm_baseline path_to_my_repo/ref_cats

The directory in the data repo must be called ref_cats unless this template is overridden in a policy file.

See RFC-257 for a fully worked example using DM-8232.

RHL · January 4, 2017, 3:02am

Thank you! Do we use the data registry for this, or the calibration registry, or a new one just for reference catalogues? I think I’d rather not use the data registry as I suspect that we’ll want to use one reference catalogue for multiple datasets.

KSK · January 4, 2017, 3:49pm

At the moment there is no registry involved. In my implementation, you choose the reference by specifying the catalog name in the config and that name is used to complete a template which points to the correct reference catalog. I think that solves your usecase because we can link any of the reference catalogs into any data repository. I know that K-T wants to eventually have repositories of repositories to hold the catalogs, but that requires work in pipe_tasks.

RHL · January 4, 2017, 3:55pm

I must have misunderstood:

What does this do?

price · January 5, 2017, 2:52am

Are we still able to use astrometry.net-based catalogs? Could you please demonstrate how that is done (what configuration to use for processCcd.py)?

KSK · January 5, 2017, 3:30pm

That puts the reference catalogs into the sharded form. It does not create a registry, but does require a minimal repository to write the shards into.

KSK · January 5, 2017, 3:33pm

Usage of the a.net based catalogs is unchanged. Because of the way we locate the a.net index files, you must use the same refObjeLoader for both astrometry and photometry. It is still the default, but I would like to change that soon. I don’t have a worked example handy, but it should be exactly the same as you’ve done in the past.

Edit: I now see your comments on DM-8841. This ticket did make changes to the names of the refObjLoaders in the configs, so if you have config overrides, they will need to use the new names as you have in that ticket.

price · January 5, 2017, 3:47pm

Thanks. I think once we have the PS1 PV3 catalog available (working on it now!), we could flip the switch.

price · January 10, 2017, 2:33pm

This does not work with the current default set of configurations. It is necessary to also retarget the refObjLoaders. For this, you must use a config override file like this:

$> cat processCcd-overrides.py
from lsst.meas.algorithms import LoadIndexedReferenceObjectsTask
config.calibrate.photoRefObjLoader.retarget(LoadIndexedReferenceObjectsTask)
config.calibrate.photoRefObjLoader.ref_dataset_name = "catalog_1"
config.calibrate.astromRefObjLoader.retarget(LoadIndexedReferenceObjectsTask)
config.calibrate.astromRefObjLoader.ref_dataset_name = "catalog_2"
$> processCcd.py my_repo --id visit=12345 ccd=67 --config-file processCcd-overrides.py

KSK · January 10, 2017, 3:33pm

@price Thanks for adding that, and sorry for leaving it out of the original post.