wiki:Winter2014/Bosch/Measurement
Last modified 6 years ago Last modified on 10/21/2013 10:02:16 AM

Bosch's Refactoring Thoughts: Measurement

The current measurement framework consists of:

  • A C++ base class, Algorithm, its associated control/factory class, AlgorithmControl and a few subclasses intended as intermediate base classes.
  • A C++ driver class, MeasureSources, which manages a set of algorithms.
  • A Python driver task, SourceMeasurementTask, which combines a MeasureSources instance with a subtask that replaces neighboring objects with noise using deblend information.

This system works fine in its most common use case: measuring sources on an Exposure that were detected and deblended on that same exposure, using only existing algorithms. I believe it has some serious deficiencies in other areas.

Using Algorithms as Primitives

Many of the measurement algorithms are useful outside the context of source measurement, particularly the shape, sinc flux, and centroid algorithms; these are useful primitives that should be easy to call without constructing a MeasureSources object and an associated schema. Some algorithms do have separate public interfaces for these use cases, but there is no consistency between these, and they're almost invariably in less-than-ideal locations (for instance, this interface for SdssShape is in a detail namespace, and the class is called SdssShapeImpl, which hardly implies that it should be used directly). For most algorithms, the entire Algorithm subclass is buried in a private namespace within a .cc file, making it impossible to use the algorithm without a MeasureSources instance.

Fixing this is complicated by the fact that the outputs of algorithm classes are very much dynamic: they depend not only on the algorithm subclass, but possibly on its configuration as well. We handle this in the context of source measurement by allowing each algorithm to register its fields in a Schema up-front, and then ask it to fill these fields in a Record object when called - this allows us to use a record as the dynamic "bucket of values" that holds algorithm outputs. When using an algorithm as a primitive, one of the things we'd like to avoid is setting up Schema and Table objects in order to produce Records.

I think the best approach here is just to attempt to provide as much consistency as possible without relying on language features to enforce a particular API:

  • Each algorithm that may be useful as a primitive should define a struct for its outputs (probably typedef'd as "Result" within the algorithm class).
  • Algorithms should provide a static method that takes the associated control object, an Image or MaskedImage, and anything else needed as inputs that would otherwise be retrieved from the source (e.g. Footprint, centroid), and returns its result struct.
  • The virtual method that implements source measurement should delegate to this static method, and use the result struct to fill in the output Record.

Related tickets: #1901

Simplifying Extensions and Debugging

It's currently a bit of a pain - and very intimidating for a C++ novice - to implement a new measurement algorithm class.

The biggest issue is that the Algorithm class uses a complicated, macro-based technique to simulate templated virtual member functions, which we really don't need. We only perform source measurement on single-precision exposures, and I believe we should simply remove this aspect of the Algorithm inheritance interface. This would leave us with a normal virtual member function to override, rather than a complicated set of templated nonvirtual functions to override and macros that must be invoked. If we follow the plan described in the previous section, this would also not get in the way of using algorithms on double-precision (or possibly integer) images outside the context of source measurement, as the static member functions that allow the algorithms to be used as primitives could be templated.

I also believe we should consider moving the driver code from C++ to Python. We currently already do the loop over sources in Python, and I believe we could do the loop over algorithms in Python as well with minimal performance implications (though this should be investigated). Most built-in algorithms would continue to be implemented in C++ (and we could have a C++ base class for all such algorithms), but this would not be a requirement for a measurement plugin. This would have several advantages:

  • Defining a new measurement algorithm would be even easier, and we could prototype algorithms in Python even if we eventually need to move them to C++.
  • We could eliminate the MeasureSources class as a separate object, and move this code directly into SourceMeasurementTask.
  • Debugging measurement algorithms would get substantially easier. We could add display hooks to individual measurement algorithms, and use the pipe_base timing mechanisms to record per-algorithm performance information.

Forced Photometry and MultiFit

The interface defined by the Algorithm class doesn't work for extended source forced photometry - it assumes the only input from the reference catalog is a centroid. It also doesn't work at all for MultiFit, as it assumes that the data being measured is a single exposure. So far, we've mostly tried to shoe-horn in forced photometry by making slight adjustments to the Algorithm interface, which ignores the fact that not all measurements are even appropriate for forced mode, and if they are, the algorithms may be subtly different. Forced measurement should be allowed to have an entirely different configuration and an entirely different set of output values. MultiFit is in many respects even more different.

So, instead of a single hierarchy of Algorithms and AlgorithmControl factory classes that are expected to do all of these things, we should have three separate hierarchies - but if a particular measurement wants to delegate all three to a single implementation, that's fine (and we'll probably want a set of intermediate base classes to make it particularly easy to do just that).

Multi-Object Fitting

In the absence of a deblender that can generate heavy footprints on exposures other than the one in which deblending was performed, we have no way to perform deblended measurement in forced photometry or MultiFit. This essentially means our deblender is currently useless in the context of measuring colors. One solution to this problem (and just as importantly, a first step towards most other solutions) is to measure all children of a deblended parent object simultaneously. To do this, we need a measurement framework that supports a multi-object fitting API.

Altogether, this makes for a lot of different measurement modes - for each of single-epoch measurement, forced photometry, and MultiFit`, we'd want both single-object and multi-object modes. I don't think we want to just generalize the single-object mode to multi-object, as there are important algorithms (e.g. adaptive moments, aperture fluxes) for which multi-object mode isn't well defined, and we need to preserve the ability to run these in single-object mode.

I don't think we want six class hierarchies, however, for each combination of input data format and single/multi-object. I think we probably want three class hierarchies, for each type of target data, and support for both single- and multi-object fitting within each (with multi-object fitting optional).

Straw-Man Design

(TODO)