Ticket #835 (closed defect: fixed)

Opened 10 years ago

Last modified 7 years ago

The Wcs class mis-handles some FITS files

Reported by: becker Owned by: fergal
Priority: critical Milestone:
Component: afw Keywords:
Cc: rhl, rowen, rplante Blocked By:
Blocking: Project: LSST
Version Number: afw rev 9936
How to repeat:

not applicable

Description

The afw trunk contains the script examples/warpExposure.py which exercises afwMath.warpExposure(). If I apply this script to images from a recent test run

python warpExposure.py
/lsst/DC3root/acb0019/IPSD/output/tmpl/v695856/v695856-c000-a00.tmpl
/lsst/DC3root/acb0019/IPSD/output/sci/v695856-e0/v695856-e0-c000-a00.sci
twarp

the resulting warped template image (twarp) does *not* match the science image. But if I display the input images in ds9, they *do* match when registered via WCS.

When this process happens during a pipeline run, the template is being correctly warped to match the science image, as the quality of the difference image

/lsst/DC3root/acb0019/IPSD/output/diff/v695856-e0/v695856-e0-c000-a00.diff

attests to.

Change History

comment:1 Changed 10 years ago by rowen

  • Status changed from new to assigned

comment:2 follow-up: ↓ 8 Changed 10 years ago by rowen

  • Cc rhl, rowen added
  • Owner changed from rowen to fergal

The problem is that afw's Wcs class can't handle the WCS that is in the science exposure.

To prove this:

  • Load the science exposure and template exposures listed in the PR
  • Extract the WCS
  • Compute wcs.raDecToXY(215.58457, 53.14217), which is approximately the center of a quite bright star near (and in line with) an obvious galaxy.
    • My results for the template image: x,y = 165, 267, which agrees with ds9
    • My results for the science image: x, y = 727, 905, whereas ds9 shows it as approximately 880, 903

comment:3 Changed 10 years ago by rowen

More results:

template corners converted to RA, Dec (these agree with ds9)
       0        0  215.59857   53.12828
       0      999  215.59945   53.17989
    1069        0  215.50653   53.12881
    1069      999  215.50730   53.18042
template star RA, Dec converted to x,y (this agrees with ds9):
     165      267  215.58457   53.14217
science corners converted to RA, Dec (these do NOT agree with ds9):
       0        0  215.54721   53.18896
       0     1152  215.54735   53.13011
    1023        0  215.59952   53.18812
    1023     1152  215.59973   53.12932
science star RA, Dec converted to x,y (this does NOT agree with ds9):
     727      905  215.58457   53.14217

comment:4 Changed 10 years ago by becker

So is the basic problem that the Wcs does not survive round-tripping through persistence?

comment:5 Changed 10 years ago by ktl

Both the template and the science image are persisted using the same mechanism. Only the science image has TAN-SIP corrections, though.

comment:6 Changed 10 years ago by ktl

Is it correct for negative doubles in FITS headers to be one character longer than other numbers?

CD1_1   = 5.10808596133527E-05                                                  
CD1_2   = 1.85579539217196E-07                                                  
CD2_2   = -5.10281493481982E-05                                                 
A_1_2   = -2.26389681686283E-11                                                 
CD2_1   = -8.27440751733828E-07                                                 
A_1_1   =   1.145765305928E-06                                                  

As far as I can tell, this is being produced by cfitsio, not our code, using

afw/src/image/fitsIo.cc(120-121):
double tmp = metadata->get<double>(keyWord);
fits_write_key(fd, TDOUBLE, keyWordChars, &tmp, keyCommentChars, &status);

comment:7 Changed 10 years ago by becker

I think this is OK, and not the source of this problem.

This ticket is blocking diffim improvements right now, btw.

comment:8 in reply to: ↑ 2 Changed 10 years ago by ktl

Replying to rowen:

  • Compute wcs.raDecToXY(215.58457, 53.14217), which is approximately the center of a quite bright star near (and in line with) an obvious galaxy.
    • My results for the science image: x, y = 727, 905, whereas ds9 shows it as approximately 880, 903

If I remove the SIP corrections from the WCS, I get (879.3, 903.2) for these coordinates.

from lsst.afw.image import ExposureF, Wcs
e = ExposureF("/lsst/DC3root/acb0019/IPSD/output/sci/v695856-e0/v695856-e0-c000-a00.sci")
w = e.getWcs()
p = w.getFitsMetadata()
for n in p.names():
  if n[0] == 'A' or n[0] == 'B':
    p.remove(n)
p.setString("CTYPE1", "RA---TAN")
p.setString("CTYPE2", "DEC--TAN")
z = Wcs(p)
z.raDecToXY(215.58457, 53.14217)

PointD(879.2854159, 903.2133201)

I'm guessing that both the template extraction code and DS9 are ignoring these SIP corrections.

comment:9 Changed 10 years ago by ktl

Nope, the template bbox code uses the science image WCS directly, so it should get SIP corrections. Still stumped...

comment:10 Changed 10 years ago by rowen

  • Version Number set to afw rev 9936

Note: I see this problem with afw 3.3.16 and the current trunk (rev 9936).

comment:11 Changed 10 years ago by fergal

  • Status changed from assigned to inTicketWork

The appears to be #801 rearing its ugly head again. The wcs in the science header says that the corners of the image map onto 14 22 11.33 +53 11 20.2 14 22 23.95 +53 07 45.3

Taking only the linear portion of the Wcs gives corners of 14 22 *02.23* +53 11 20.2 14 22 23.27 +53 07 45.5

and eyeballing the template image the second pair seems closer to the correct answer.

How was the wcs in the science image created -- by meas::astrom::net? That seems to be the leading suspect at the moment.

comment:12 Changed 10 years ago by becker

The wcs in the science image came out of the dc3 pipeline. It was persisted in

/lsst/DC3root/acb0019/IPSD/output/sci/v695856-e0/v695856-e0-c000-a00.sci

comment:13 Changed 10 years ago by ktl

I'm still worried that there's something fishy with the Wcs persistence/retrieval, not just with its contents. I get different results from computeTemplateBbox() running standalone with the Wcs from the science Exposure and the template Exposure than the pipeline got, according to Slice0.log.

comment:14 Changed 10 years ago by rowen

Would it be practical to redo this run (just this slice) from scratch, after adding some code that gave detailed information about the science WCS that was fit, including:

  • a listing of the WCS fields
  • a listing the computed RA/Dec of each corner of the science image.

If a good WCS was found (as shown by a good diffim or at least a good template bounding box) then one could see if the unpersisted science image WCS looks or works any differently than the one found in the pipeline.

One way to print the contents of the Wcs (though it may print some irrelevant fields):

md = wcs.getFitsMetadata()
for name in md.names():
   print name, md.get(name)

comment:15 Changed 10 years ago by fergal

I tried extracting sources from this chip (v695856-e0-c000) and re-solving and the resulting wcs looks pretty good. That rules out #801, I think, because in that bug the solver failed when trying to solve an entire chip.

I also checked that the Wcs is being correctly retrieved from the .sci file (it is), so a problem with persistence is now my leading suspect.

comment:16 Changed 10 years ago by fergal

Could someone send me, or point me to, the list of sources used by the pipeline to create the Wcs?

It's possible that it's different somehow to the list I'm using to re-create the Wcs, and might help track down the problem.

comment:17 Changed 10 years ago by fergal

So this is definitely not an astrometry.net issue. I ran the field through the pipeline, using the orginal generated by the pipeline (courtesy of KT), solved for a wcs, wrote it to disk and read it in again.

I compared the Wcs before and after persisting to disk using .toFitsMetadata().toString() and they were character by character identical. However, the wcs correctly returns the coordinates of the corners of the image before persistence, and reproduces the original error after persistence.

Clearly something is being changed that isn't showing up in .toFitsMetadata(); although I haven't figured out what it is yet.

comment:18 Changed 10 years ago by rowen

Very interesting. I suspect it is a bug in getFitsMetadata() -- that one or more FITS header cards that are read by the constructor are not returned by getFitsMetadata().

comment:19 Changed 10 years ago by ktl

The whole write/read is a bit hacky, involving delving into wcsinfo structures and calling wcspih() on synthetic FITS headers.

If someone has a better list of fields that getFitsMetadata() should be returning, let me know.

comment:20 Changed 10 years ago by ktl

Side note that is likely unrelated: taxelrod took out the wcsfix failure exception in the Wcs constructor many moons ago [3722]. wcsfix was originally turned off altogether to fix #259.

comment:21 Changed 10 years ago by fergal

  • Cc rplante added
  • Status changed from inTicketWork to needinfo
  • Owner changed from fergal to becker

Apologies for the delay in working on this. I've tracked down the bug to the internals of wcslib, possibly with a bit of sloppy programming on my part hiding the bug.

wcslib gets confused when you set the ctype to RA---TAN-SIP instead of just RA---TAN. The extra characters get the wcs_types() function confused. We should have noticed this problem earlier, except that the Wcs constructor sets ctype to RA---TAN, instead of RA---TAN-SIP. This happens for 4.3.1 (our current version), and 4.3.3 (the latest version) of wcslib.

I emailed the author of wcslib, Mark Calabretta, about the problem, and he said that it was a known bug, and is addressed in 4.4, which " will be released next week. (...or maybe the week after!)". He also sent me the corrected function from version 4.4 (although I haven't tested it to make sure it works with 4.3.1).

So how urgent is this fix? The easiest thing to do is to wait until 4.4 is released, but I could also try to create a patched 4.3.1a until 4.4 is released.

comment:22 Changed 10 years ago by becker

  • Status changed from needinfo to assigned
  • Owner changed from becker to fergal

Its not urgent anymore, since DC3a processing is technically done and I have implemented a work-around. You can probably wait until the 4.4 release.

comment:23 Changed 10 years ago by rowen

I agree that we can wait a few weeks. However, I strongly suggest adding a unit test that exposes the bug as soon as possible (before we upgrade to the new wcslib!).

comment:24 Changed 10 years ago by becker

Can I get a verification that this problem persists in the most recent (final DC3a) pipeline runs? Meaning I can't take the data in tmpl/ and warp it to the data in sci/ using DMS?

comment:25 Changed 10 years ago by rowen

  • Summary changed from warpExposure fails outside the pipeline environment to The Wcs class mis-handles some FITS files

Upon thinking about this some more I cannot see how this can be a bug in wcslib. As I understand it, the WCS class handles the SIP corrections itself when dealing with the TAN-SIP representation, but then uses wcslib to handle the remaining terms. If so, the Wcs class should be telling wcslib that it is handling normal TAN instead of TAN-SIP.

If the new wcslib will have native support for TAN-SIP, and if it is released soon, then I am comfortable with waiting for the new wcslib. Otherwise, I suggest fixing the Wcs class.

comment:26 Changed 10 years ago by fergal

The problem is in how wcslib parses the ctypes cards. If it sees RA---TAN and DEC--TAN, it deals with those correctly. If it sees RA---TAN-* and RA--DEC-* it tries a couple of different options (none of which is SIP) and then silently gives up. The relevant piece of code is in the function wcs.c:wcs_types(),

      if (ctypei[4] != '-' || ctypei[8] != '\0') {
...
          continue;
      }

The problem is definitely in wcslib. A work-around in the Wcs class would involve keeping two versions of ctypes around, one for external output, and one to pass to wcslib. I'd rather not go down this road.

If we get impatient waiting for wcslib4.4, I have the a copy of the corrected function, and we could create a new version of wcslib with just this fix applied.

comment:27 Changed 10 years ago by rowen

I still don't understand. wcslib *cannot* handle TAN-SIP, therefore I do not see why the Wcs class is ever letting it see a ctype of TAN-SIP. In my opinion the only safe and correct thing to do is hand it massaged ctypes cards with the -SIP removed.

If the new wcslib can handle TAN-SIP itself, then yes let's wait. Otherwise I strongly urge you to apply the fix that you say you don't want to do -- have a local version of ctypes cards for use by wcslib (and that is truly correct for what you are asking wcslib to do).

It seems to me that is the only safe fix. For example: what happens if we get a new wcslib that *can* handle SIP, but your code is also handling SIP? If you apply the fix of massaging the ctypes then everything continues to work. But without the fix, Wcs will handle the SIP terms and then wcslib will also handle them (at least if you let it see them; I don't know what it would do if you hide the SIP terms from wcslib).

comment:28 Changed 10 years ago by rhl

I think that the current situation is:

  • The Wcs class sees a RA--TAN-SIP FITS header
  • It passes the header to wcslib; wcslib ignores the SIP keywords (once Fergal applies his/Calabretta's changes that ignore the "-SIP"
  • The Wcs class parses the SIP polynomial keywords and applies them

If that's correct, I agree with Russell; we shouldn't be passing a header to wcslib and relying on it ignoring keys --- rather we should change the CTYPEs we pass to wcslib. Our Wcs class won't need its own CTYPE keywords; it just needs a bool or non-trivial SIP polynomials or whatever to say, "Use these polynomials". If wcslib adds full support for SIP, we'd remove our code at that point.

comment:29 follow-up: ↓ 30 Changed 10 years ago by fergal

I still think modifying ctypes is an ugly solution.

  • The behaviour of the current wcslib is a bug, and is going to be fixed. wcslib is supposed to ignore parts of keys it doesn't understand.
  • The is no possibility of a future version of wcslib silently processing the SIP polynomials unless we re-write the Wcs class to pass these polynomials to wcslib.
  • If we change the ctypes field of the wcsprm struct to work around this bug, we will also have to change the wcsFormatter class to check whether
    • the value in ctypes needs to be written to the fits file as is, or
    • first have "-SIP" appended to it.
  • Because we will have to make changes in two files to work around this bug, the work around will be more prone to errors than the bug fix.

If I can't convince people the work around is a bad idea I'll implement it, but it screams ugly to me.

comment:30 in reply to: ↑ 29 Changed 10 years ago by ktl

Replying to fergal:

I still think modifying ctypes is an ugly solution.

Don't think of it as "modifying ctypes". Think of it as providing the appropriate control parameter to the wcslib interface. Our current Wcs class is expecting wcslib to handle all aspects of the transformation except the SIP corrections, which are explicitly handled in the class. If wcslib were to handle SIP corrections, we'd get the wrong answer. Therefore, we should deep-copy the incoming PropertySet, modify the CTYPE1 and CTYPE2 headers to remove "-SIP" if present, remove *_ORDER and {A,B,AP,BP}_*_* headers, and submit the resulting PropertySet (after conversion to FITS headers) to wcslib.

  • The is no possibility of a future version of wcslib silently processing the SIP polynomials unless we re-write the Wcs class to pass these polynomials to wcslib.

We would have to do this anyway, since we'd have to take out the explicit processing that is currently going on.

  • If we change the ctypes field of the wcsprm struct to work around this bug, we will also have to change the wcsFormatter class to check whether
    • the value in ctypes needs to be written to the fits file as is, or
    • first have "-SIP" appended to it.

This is actually already handled. If the SIP matrices have non-zero size, "-SIP" is appended to the wcsprm ctype if it doesn't already exist.

comment:31 Changed 10 years ago by fergal

I've modified the Wcs constructor for fits metadata to set the ctypes to RA---TAN, as a temporary solution, and checked this into trunk. I'm hoping to implement a more elegant solution as soon as I get a chance, but this works for the moment.

comment:32 follow-up: ↓ 33 Changed 10 years ago by rowen

Thank you very much! I hope that the your replacement more elegant solution will continue to send TAN, instead of TAN-SIP, to wcslib.

Did you include a unit test that shows the failure?

comment:33 in reply to: ↑ 32 Changed 10 years ago by fergal

Replying to rowen:

Thank you very much! I hope that the your replacement more elegant solution will continue to send TAN, instead of TAN-SIP, to wcslib.

Did you include a unit test that shows the failure?

Yes, there's a test. testWcs835.py reproduces the bug, and demonstrates the fix.

comment:34 Changed 10 years ago by fergal

  • Status changed from assigned to closed
  • Resolution set to fixed

I fixed this bug and merged to trunk sometime ago. The Wcs constructor from fits metadata now extracts only the linear terms of the Wcs from the fits header, and passes them to the wcsprm object defined in wcslib.

comment:35 Changed 7 years ago by robyn

  • Milestone DC3a Completed deleted

Milestone DC3a Completed deleted

Note: See TracTickets for help on using tickets.