wiki:Summer2012/ForcedPhotometry
Last modified 7 years ago Last modified on 04/05/2012 03:05:54 PM

Here are some thoughts on performing forced photometry for Summer 2012.

We want to use detections from coadds to measure on the individual exposures.

All in memory

One option is to attempt to do everything with all the data for a particular part of the sky in memory at once. However, because coadds and CCDs are not (in general) aligned, this approach would mean a lot of processing duplication (a factor of 2--3?) so that each CCD could be entirely contained within a coadd.

This could, perhaps, be reduced using MPI to communicate detections between neighbouring coadds. This would involve coordinating multiple coadds, and could be done, though it is more difficult than I would like. Furthermore, neighbouring coadds may have a broad distribution of processing times (depending on the number of inputs, which can change according to the precise position of overlaps, etc), causing inefficiency as one sky tile waits for its fellows to complete.

We might attempt to get around the need for duplication by making the sky tiles much smaller than a CCD and only reading and measuring on the portion of the CCD corresponding to the sky tile. This multiplies border issues, though we could deal with them. However, blended sources may drive us to doing measurement on as large a contiguous CCD area as possible, so that the measurements can be made simultaneously.

Separate processes

We therefore turn to considering the detection and measurement as separate processes. Under this scheme, we iterate over sky tiles to perform the detection, store the results, and then separately iterate over the CCDs to perform the measurements. Crucial to this process is determining which sky tile sources correspond to each CCD. This is an operation that databases are good at, though we could consider doing it outside the database. There may not be any reason why we could not make both available using different Tasks.

To perform this outside the database, we would use the same spatial indexing scheme as the coadds to determine which sky tiles overlap our exposure and read only the sources for those sky tiles. It is probably adequate to simply make single pass through all the sources so selected to determine those that are in the exposure. More sophisticated techniques (e.g., kd-tree) are available, but probably not warranted. The particular spatial indexing scheme in use likely depends on the data source, and so it might be advisable to put hooks in the obs_* packages for common spatial indexing calls.

Database interaction

Since we are considering different processes for detection and measurement, the database ingest can be a separate operation performed between detection and measurement. Therefore, the only database operation our code has to do is getting the list of sky tile sources that overlap a CCD, which should be just a SELECT (employing some UDFs to make use of the spatial indexing).

It would be most efficient to load the sources from the coadds as CSV files and then calculate the spatial indexes in a separate stage.

We may need to take care that we don't overload the database with queries. We could even do the source-to-CCD query as a separate operation, recording the results for our measurement process to pick up.

Map-reduce style approach (bucketing)

This approach uses separate detection and measurement processes, but pushes the spatial indexing earlier so that it happens at the tail-end of detection. Thus, instead of storing the results in CSV files as-is, each detection processes separates its sources into a collection of buckets, where each bucket represents the set of sources located within the boundaries of a particular exposure. When detection is complete, the buckets can be exchanged and merged, so that the measurement processes can proceed with sources already divided per-exposure, in parallel. In this scheme, no centralized database queries are needed, and nearly all computation can be done in parallel.

This approach needs an efficient (indexing) algorithm to determine the set of exposures (buckets) to which each detected source belongs. Though the number of exposures is large, the number of those of concern for a coadd should be much smaller, and indexing should be pretty fast. Even if it were a little slower (using a less sophisticated algorithm than an optimal db), it is fully parallel, so there should be a net win.

This approach really benefits from the all-to-all data exchange that map-reduce implementations rely on. We don't have such a framework now (yet), but could implement the exchange using a shared filesystem (mediated by the butler?).

If the bucketing is expensive, GPUs could potentially be used to perform the bucketing: using one thread per potential exposure, stream sources to all threads, and they all compute point-in-polygon simultaneously. This seems like something that GPUs are good at.

Proposed algorithm

Case A: Database interaction

Case B: No database interaction

Case C: Non-database bucketing

  1. Detection: for each sky tile
    1. Given multiple coadds (e.g., one per filter), generate detection image (e.g., chi-squared)
    2. Detect sources on image
    3. For each input coadd { for each source { measure at the source position on the coadd }, give sources to butler }

1C. Detection (a, b same as 1)

  1. For each input coadd { for each source { measure at the source position on the coadd, compute set of buckets, write to all affected buckets }, give buckets to butler }

2A. Database ingest

  1. Convert all sky tile coadd source files to CSV
  2. Ingest CSV files into database
  3. Calculate spatial indexes in database
  4. Optionally: for each CCD { get list sources on that CCD from database, give sources to butler }

2B. No-op

2C. No-op

3A. Measurement: for each CCD

  1. Query database (or get from butler) for list of sources on that CCD
  2. Measure sources
  3. Give measurements to butler

3B. Measurement: for each CCD

  1. Determine indices of overlapping sky tiles
  2. For each overlapping sky tile { get sky tile sources from butler, determine which are on CCD }
  3. Measure sources
  4. Give measurements to butler

3C. Measurement: for each CCD

  1. Read and merge all source buckets corresponding to this CCD from the butler }
  2. Measure sources
  3. Give measurements to butler.

Database tables

Regardless of whether we use the database to support the operation measuring the forced photometry, we will want to put the results of the photometry in the database. Here we list the tables that will be required, and the schema. I'm very sorry if the style or naming conflicts with LSST standards; please feel free to update and fix!

Coadds

This represents the coadd frame as a whole.

  • coaddId PRIMARY: identifier for the coadd
  • skytileId PRIMARY: identifier for the sky tile
  • rerun: the particular processing version of the coadd
  • filterId: identifier for the filter
  • psfFWHM: average PSF FWHM
  • rms: RMS of background
  • expTime: average exposure time
  • numInputs: average number of input exposures
  • meanTime: average epoch of coadd
  • deltaTime: total time covered by inputs (latest time minus earliest time)

CoaddCalibration

This represents an attempt to calibrate the coadds.

  • coaddId PRIMARY: identifer for the coadd
  • skytileId PRIMARY: identifier for the sky tile
  • zp: magnitude zero point
  • zpErr: error in magnitude zero point

CoaddForced

Measurements made on a coadd by forced photometry.

  • coaddId PRIMARY: identifier for the coadd
  • skytileId PRIMARY: identifier for the sky tile
  • objectId PRIMARY: identifier for the object
  • x, xErr, y, yErr: centroid and its error
  • psfFlux, psfFluxErr: PSF flux and its error
  • apFlux, apFluxErr: aperture flux and its error
  • flags: measurement flags

Do we want to include shapes? They won't be good for lensing science, but might be useful for others.

CcdForced

Measurements made on a CCD by forced photometry.

  • scienceCcdExposureId PRIMARY: identifier for the CCD
  • objectId PRIMARY: identifier for the object
  • x, xErr, y, yErr: centroid and its error
  • psfFlux, psfFluxErr: PSF flux and its error
  • apFlux, apFluxErr: aperture flux and its error
  • flags: measurement flags

At this point, we aren't interested in much more than photometry of point sources on the CCDs. Centroids might be useful to measure proper motion of faint sources (e.g., 2009AJ....137.4400L).

Comments

Comment by price on Thu 05 Apr 2012 10:05:25 AM CDT

I think case C (non-database bucketing) is essentially the same as case B, except that the bucketing is done as part of the detection stage, rather than the measurement stage. I would prefer making the bucketing part of the measurement stage, since then we're not limited to doing measurement only on those frames the detection stage knew about.

Add comment