We’re hoping for a little help pointing out how to debug a problem running jointcal on some DECam data. For a specific set of (at least) 6 dates in June, 2013, jointcal (on all filters) fails the astrometric fitting just as fitting starts—here’s the relevant part of the logfile:
jointcal.Associations INFO: Associated 33456 reference stars among 87194
jointcal.Associations INFO: Fitted stars before measurement # cut: 72458 dataId=jointcal.Associations INFO: Fitted stars after measurement # cut: 56878
jointcal.Associations INFO: Total, valid number of Measured stars: 873540, 857960'visit': 343378, 'ccdnum': 43}
jointcal INFO: === Starting astrometric fitting...acts for components of dataId=jointcal.ConstrainedAstrometryModel INFO: Got 58 chip mappings and 79 visit mappings; holding chip 28 fixed (3660 total parameters).ts for components of dataId=jointcal.AstrometryFit INFO: Reference Color: 0 sig 0
jointcal INFO: Initial chi2/ndof : nan/1782832=nanacts for components of dataId=jointcal FATAL: Failed processing tract 0, FloatingPointError: Initial chi2 is invalid: chi2/ndof : nan/1782832=nan “
My guess is that the fitting fails because the optimizer doesn’t know what to do with NaN. However, we don’t understand where this error comes from.
We’ve done some detective work but haven’t been able to figure it out:
-
It’s not in the pixel values data of the raw images—processCcd runs fine on the exposures, and the calexps look fine.
-
Both older and newer images than those dates run through jointcal fine.
-
it doesn’t appear to be related to the reference catalog—we tried both PS1 and SDSS (in addition to our default GAIA astrometry catalog) and got the same result. Furthermore, if we omit those specific dates, jointcal runs fine.
-
It’s not related to a specific point of the sky—we’ve checked 4-5 different pointing positions separated by many tens of degrees and the behavior is the same.
-
It doesn’t appear to be in the source positions—we haven’t yet looked at all chips on all exposures, but we’ve displayed the ra,dec of the source catalogs for many, and there’s nothing weird-looking in the range of ra,dec. However, when we include those chips in the jointcal, we get the NaN error above.
-
It doesn’t depend on filter—u,g,r,i,z images all fail if they were taken on those dates
-
We looked through the image headers, and apart from some differences in keywords being uppercase or lowercase, there didn’t seem to be anything weird (in particular all the WCS keywords seemed o.k…).
At this point, I’m not sure where to look for the source of NaN. Has anyone else encountered this before and have an idea of how to fix this? Are there any other metadata we should be looking at to help diagnose the problem?
Thanks!
Ian