wiki:Winter2014/Design/Measurement
Last modified 5 years ago Last modified on 02/05/2014 09:53:15 AM

Design for W14 Measurement Framework

Most of the content here is discussed more fully here, but I've tried to better separate requirements from design (pending) on this page.

This page and the analysis toolkit page have spawned a new design page on afw::table work. For the most part, that work is needed for the data analysis toolkit work (and perhaps some of the butler work), but the overall workload will be lessened if we can sequence it with the measurement framework work such that we don't have to reimplement algorithms in the new measurement framework using the old afw::table::Schema API and field name conventions before we switch to the new API and conventions.

Requirements

  • Algorithms that can be used as primitives outside of the context of the measurement framework should be available via a consistent, well-defined interface, ideally in a way that does not require the construction of Catalog objects to store single-record outputs.
    • This should at least include the sinc aperture algorithms and the SDSS shape algorithm.
    • (PAP) Want a generic (elliptical) aperture photometry algorithm that will use sinc on small apertures and regular aperture photometry on larger apertures. This could be used by all other algorithms performing aperture photometry. Some of this is simply saying that the sinc aperture shouldn't be an Algorithm, just an algorithm.
  • Measurement tasks, plugin frameworks, and algorithm plugins for forced measurement that support algorithmic-specific reference quantities (i.e. more than just positions).
    • Must support multiple kinds of references and target images: coadd->exposure, coadd->coadd, external->exposure, external->coadd (not all may be implemented now, but design should not preclude any of these). See also #2350.
    • Must support multi-object simultaneous measurement (for plugins that support this) to address blending. Framework support for simultaneous fitting will be prioritized over individual algorithm support.
  • Measurement tasks, plugin frameworks, and algorithm plugins for MultiFit-style processing.
    • Must contain hooks that allow the driver code to know exactly what data will be needed far in advance (i.e. a preprocessing step to determine the per-exposure data regions needed), to enable future I/O optimization. This optimization will not be part of the W14 work, however.
    • Must support multi-object simultaneous measurement (for plugins that support this) to address blending. Framework support for simultaneous fitting will be prioritized over individual algorithm support.
  • Measurement tasks, plugin frameworks, and algorithm plugins for multi-object simultaneous fitting.
    • Because we don't have a way to produce deblended HeavyFootprints in forced measurement or MultiFit, we should provide a framework that can measure all blended children simultaneously.
    • Not all algorithms will required to support this mode.
  • New plugin frameworks should encourage documentation of plugin algorithms (#2779)
  • Provide easier debugging and timing for plugin algorithms (see also #2229).
  • Simpler, less-opaque implementation of plugin algorithms: in Python if computationally feasible, in straightforward C++ otherwise.
  • Fixup and unify circular aperture photometry algorithms, making a multi-aperture plugin that uses the sinc approach intelligently and can be used with slots.
  • Review and update floating point precision of measurement fields as appropriate (we currently store many values as doubles for which float is sufficient; #1998).
  • Improve and vet model-based galaxy photometry measurements, and make them available for all three measurement modes (single-frame, forced, MultiFit).
  • Additional measurement-related tickets that should be addressed:
    • #2589 - don't call something "priority" unless big numbers go first
      • (PAP) Would be nice to declare dependencies instead of organising priorities.
      • (JFB) I once would have agreed with you, but now I disagree for a two reasons: A) I think it's extra complexity we don't need; B) often the dependencies aren't on a particular algorithm but on a whole class of algorithms (aperture correction needs all the fluxes) or a particular slot (elliptical apertures need a shape, don't care which one).
    • #2617 - better exception granularity
    • #2740 - audit which algorithms are enabled by default
    • #2844 - when possible, make boolean config options False by default
    • #2751 - Speed of scaled elliptical apertures? (PAP)
  • We need to be able to identify algorithms designed for preconvolved and/or difference images, and check that we're running the right kind of the data we have.

Design

Organization and Sequencing

The new measurement framework and algorithms will be implemented in two completely new packages:

  • meas_base will contain the measurement driver classes (including tasks) and the algorithm base classes.
  • meas_core will contain the main measurement algorithms that we enable by default on most cameras.

I feel these names are too similar. meas_base is a great name, but meas_algorithms seems so much clearer to me than meas_core. If evolution of that code in an existing package is too tricky, then how about meas_alg as the replacement name? (RO).

This will allow us to work incrementally on the new packages, creating a (mostly) drop-in replacement for SourceMeasurementTask without removing anything from meas_algorithms or breaking anything until the work is done. It will also be an important step towards creating more logical package boundaries from the current meas_algorithms smorgasbord. Finally, it will also make it easier for new plugin implementors to find the base classes and example classes they'll want for documentation purposes.

Once the new framework and core algorithms are complete, we will convert existing meas_extensions packages to the new framework and remove the old framework and algorithms from meas_algorithms. We will rename these packages from meas_extensions_* to meas_ext_*.

Measurement Drivers

We will have three measurement driver tasks, with one for each type of data organization: "SourceMeasurementTask", "ForcedMeasurementTask", and "MultiFitTask". Each of these will be associated with its own registry of plugins. The tasks will contain the machinery to initialize the algorithms and the schema themselves (rather than delegate this to a C++ class like the current MeasureSources). "SourceMeasurementTask" will generally be designed to be used as a subtask, but ForcedMeasurementTask and MultiFitTask may be designed as top-level command-line tasks.

One plugin class will provide both single-object measurements and multi-object measurement (of deblends), via different methods. Algorithms that do not support multi-object mode will simply implement only one method. Plugins that support both modes may be configured to run in either or both modes, but execution order will only be respected relative to other measurements being run in the same mode: all single-object measurements will be performed on a source before the multi-object measurements. It is expected (but not required) that algorithms that are run in both modes will generally use the same fields, and use the multi-object measurements to update the single-object measurements. Note that this means that algorithms that are needed as dependencies of single-object plugins should not be run only in multi-object mode. The drivers will warn if a slot is set to an algorithm that is being run only in multi-object mode.

The detailed order of operations for the main loop of all measurement drivers will be:

Replace all objects with noise

For each parent in catalog:
   For each child of parent:
      Unreplace noise with data over child footprint
      Run all plugins configured for single-object mode on child
      Restore noise in child footprint
   Unreplace noise with with data over parent footprint
   Run all plugins configured for single-object mode on parent
   Run all plugins configured for multi-object mode on parent (as a single-element list)
   Run all plugins configured for multi-object mode on the list of all children
   Restore noise in parent footprint

Unreplace noise for all objects

There are subtle differences for the different input types, however; in MultiFit, we'll need to load data within the loop over parents, and in both forced measurement and MultiFit we'll have to use IDs from the reference objects to determine deblend families rather than the output sources. The current design has this same basic loop repeated in all three driver tasks. Trying to give them a common base class to hold this loop and a few other similar-but-not-identical pieces of code made it extremely hard to follow, and the complexity of the hooks needed to do that almost certainly outweighs the disadvantages of the near-duplication.

As for the details, I've put together some heavily-commented prototypes for the driver tasks and algorithm APIs here: https://dev.lsstcorp.org/cgit/personal/jbosch/meas_base.git/

These are in no way runnable, and most of the implementation is still in the form of "TODO" comments. They do define most of the public API however, and they contain at least the beginnings of the main loop logic, which I felt was necessary to at least partially implement in order to define all the interfaces.

Algorithm Plugins

While the APIs defined in the meas_base prototypes linked above provide everything the driver tasks need to know about the algorithms, we'll need to do some more work to ensure that implementing new plugins isn't too painful. With up to six signatures that an algorithm may want to support (3 types of inputs, for both single-object and multi-object fitting), this will challenging. We'll also want to do much of this work in C++; while the new framework will allow new algorithms to be implemented in Python, we'll want to continue to implement all of our current ones in C++ for performance reasons.

The general approach will be:

  • The plugin implementor will provide a central algorithm class with several apply() methods that do the actual pixel-processing work, returning results in algorithm-specific structs (or vectors of structs, for apply() methods that fit multiple objects at once). Both the central class and output struct(s) will generally be defined in Swigged C++.

If you intend this to be mimicked in Python then it would be clearer to use different names for different purposes, instead of using operator overloading. (RO)

  • The central algorithm class will not inherit from any of the algorithm plugin classes defined in the above prototype; it need not even be a polymorphic class, and the apply() methods should not generally be virtual (as different algorithms will return different structs).
  • The central algorithm class should have one or more associated Config or Control classes. If different data modes (i.e. single-frame vs. forced vs. MultiFit) require different configuration, these should have a common base class that is sufficient to initialize the algorithm class.
  • The central algorithm class's constructor should take only a base Config or Control object and a set of MeasurementDataFlags. In general, the only state held by the class should be workspace, as everything else it needs will be passed to the apply methods.
  • apply() methods may have any signature (for code reuse or convenience), but only certain signatures will be used to provide plugin support. These will generally correspond to the signatures of the measureSingle() and measureMulti() methods of the plugin base classes, but with some differences (in general, they will not take output record objects, as they will instead return values via the structs, and they will take Config or Control objects as their last argument).
  • Output structs should provide methods that transfer their data member values to record objects, as well as methods that add their fields to a schema object.
  • We will provide functions that take a "central algorithm class" as defined above and create and register plugin algorithm classes that delegate all work to the apply() methods and the record/schema manipulators on the result structs.
    • Different plugin-creation functions will expect slightly different apply() signatures, corresponding to different kinds of measurements, and will have optional keyword arguments that allow for further control over how to call apply() methods. The complete set of plugin-creation functions and supported apply() signatures is TBD.
    • Ideally, the plugin creator functions will be defined mostly. Python, as this will make dealing with the essentially "duck-typed" nature of the central algorithm classes much easier. If necessary for performance reasons, we may move some of their implementation to C++ using CRTP.

The advantages of this approach are:

  • With a large enough suite of plugin-creator functions, users should rarely have to implement their own plugins (or indeed, even know that plugin classes exist).
  • The central algorithm classes can be used directly outside of the plugin/driver framework with intuitive, consistent interfaces that don't require catalogs for one-off use.
  • The most common types of algorithms can receive the most attention in the development of full-featured plugin-creators, minimizing the amount of boilerplate needed to create these.
  • All of the heavy-lifting that needs to be done in C++ is confined to the apply() methods (and to a lesser extent the transfer routines on the structs); all of the bookkeeping is in Python.

The main disadvantages are:

  • It will be difficult to predict how this will affect performance until we're already pretty far along in the design. All of these things can be done in C++ (including the measurement driver loops) if necessary, but the code would likely be a lot more complex if we have to abandon Python. To attempt to get a rough idea, we should try to time how long we take per-source using the current mostly-C++ measurement framework, and compare this to the overhead involved in dispatching an overloaded function through Swig (which we'd also try to measure).
  • The connection between the measurement driver tasks and actual algorithm implementations will remain a bit opaque, as the plugin classes that link them will be machine-generated. I think at some level this is an unavoidable tradeoff associated with trying to make algorithm implementation as concise and easy as possible in the presence of the large number of ways in which algorithms can be run.

Algorithm Refactoring

In addition to changing the plugin interface and task drivers, I'd like to cleanup, merge, and in some cases prune existing algorithms. Here's a brief overview, in no particular order:

  • PixelFlags: could we make the mask planes and source flags set somehow configurable and hence extensible?
  • SkyCoord: no changes needed, I think.
  • Classification: the config parameter settings for this logically depend on which algorithm occupies the model flux slot, but there's no mechanism in config to make that work automatically. Don't know how to fix that right now, so we probably just leave it as-is.
  • CorrectFluxes: this algorithm has hooks into a lot of other algorithms, as it's responsible for measuring and applying aperture corrections. I'll have to try some prototypes before I'll have a better idea of how to reimplement it. FunctorKeys may play a role here. We should put off reimplementing this one for now.
  • PsfFlux: option to exclude mask planes from fit, better detailed flag handling
  • PeakLikelihoodFlux: this is a strange beast as it's designed to run only on images that have already been convolved with the PSF (i.e likelihood images). It should pay close attention to the measurement flags being passed in from the framework, and we may want to consider walling it off into its own subdirectory/namespace (or maybe moving it to a separate package focused on measuring on likelihood images).
  • SincFlux + ApertureFlux: these should be merged into a single new algorithm that does a range of singular apertures, using the sinc approach for small apertures and the faster naive approach for large apertures (with the cutoff configurable, of course). In addition, we should move the low-level interface to the sinc photometry code to the algorithm class, rather than letting it live off in free functions. This will be one of the trickier algorithm conversions, as we'll have to worry about how to cache the sinc coefficients, and we don't yet know how to handle multiple-aperture fluxes with the new field naming conventions.
  • EllipticalApertureFlux: this should work like the new sinc/aperture flux class, but use elliptical apertures (derived from the shape slot) rather than circular ones.
  • NaiveFlux: we should just prune this. All its functionality should be provided by the new sinc/aperture plugin.
  • GaussianFlux: all the work in computing this measurement is actually done by SdssShape in SFM mode, so maybe in that case it should just generate aliases? In forced mode, it does have real work to do.
  • SdssCentroid: keep mostly as is, but wall-off the legacy code that doesn't meet our standards from the prettier interface we'll layer on top of it.
  • RecordCentroid: this should (configurably) handle shapes, too (or maybe have a new algorithm to do shapes). It will have to deal with transforming centroids (and shapes, if included), as that's now the job of this algorithm rather than the forced framework. We should probably rename it to something involving the word "Reference".
  • NaiveCentroid: should be pruned.
  • GaussianCentroid: keep mostly as-s, but wall-off the legacy code that doesn't meet our standards from the prettier interface we'll layer on top of it.
  • SdssShape: we should move all the low-level functionality provided by SdssShapeImpl into methods on the algorithm class itself, and wall-off the even lower-level legacy implementation code from the better documented, more standards-compliant public interface.

Work Tasks

This table is out of date and is no longer being updated; see the status tool/PMCS for the current schedule

# Name Start Finish Participants Dependencies Description
1 galaxy fitting in S13 testbed 2013-11-12 2013-11-19 Jim Finish implementing greedy optimizer in S13 testbed; will only run on S13 sims, but can use S13 display tools
2 galaxy fitting analysis on S13 sims 2013-11-19 2013-12-09 Jim FS1 Run fitting code on S13 single-epoch sims with different SNRs; troubleshoot poor fits
3 meas_base skeleton and refined prototype 2013-11-13 2013-11-26 Perry Create package skeleton for meas_base, study and refine the prototype on this page without trying to make it run
4 single-frame measurement framework 2013-11-26 2013-12-19 Perry FS3, SS5 Turn SFM framework prototype into working code with unit tests, but with only example plugins
5 afw::table aliases and sibling accessors 2013-11-26 2013-12-12 Jim Add alias support to afw::table, remove period/underscore swapping in persistence, add getChildren() method
6 plugin creator system 2013-12-05 2013-12-19 Jim, Perry Functions for creating and registering measurement plugins from "central algorithm" classes
7 minimal single-frame meas_core 2013-12-19 2014-01-13 Jim FS4, FS6 Reimplement core measurement algorithms in new framework for single-frame mode
8 galaxy fitting single-frame plugin 2013-12-19 2014-01-13 Yusra, Jim FS4, FS6, FS2 Create plugin interface for galaxy fitting code in single-frame mode
9 galaxy flux analysis on real data 2014-01-13 2014-01-27 Yusra, Jim FS7, FS8 Running fitting code on real data (TBD), examine fluxes, troubleshoot outliers
10 forced measurement framework 2014-01-13 2014-01-27 Perry FS4 Turn forced measurement frameowrk prototype into working code with unit tests, but with only example plugins.
11 minimal forced meas_core 2014-01-13 2014-01-20 Jim FS4, FS7 Reimplement core measurement algorithms in new framework for forced mode (with multi-object mode support)
12 galaxy fitting forced measurement plugin 2014-01-20 2014-02-03 Yusra, Jim FS6, FS8, FS10 Create plugin interface for galaxy fitting code in forced mode (including multi-object fitting)
13 galaxy colors analysis on real data 2014-02-03 2014-02-10 Yusra, Jim FS12 Running fitting code on real data (TBD) in forced mode, examine colors, troubleshoot outliers
14 multifit framework 2014-01-27 2014-02-10 Perry FS10 Turn MultiFit framework prototype into working code with unit tests, but with only example plugins
15 galaxy fitting multifit plugin 2014-02-10 2014-02-24 Yusra FS12, FS14 Create plugin interface for galaxy fitting code in multifit mode (including multi-object fitting)
16 additional core and ext plugins 2014-02-10 2014-02-24 Perry, Jim FS6, FS10, FF14 Reimplement remaining existing algorithms in new framework