wiki:ProposalNDarray

Version 3 (modified by jbosch, 10 years ago) (diff)

--

Coding Standards Exception Request (originally Third Party Software Proposal) - ndarray

Jim Bosch (of UC Davis) has written a C++ template-based N-dimensional array library called ndarray. While the original proposal was for ndarray to exist as a third-party package, following the TCT's previous discussion and a better understanding of the options, we have changed to propose integrating it directly into the LSST code base. We intend to limit the use of this library to multifit for DC3b, but we feel that in the future it may be useful to integrate it more closely with the current afw::image API. While we are of the opinion that the most logical ultimate place for ndarray is in afw, and that it will be of broader use, we accept that in the future ndarray may not be needed at all if the afw::image API is modified in the future to support our use cases, so simply putting ndarray in meas/multifit is a workable solution for DC3b.

What separates this code from regular LSST development is that it is impossible to make ndarray conform to certain LSST coding standards, and (we argue) undesirable to make it conform to others. As a result, we are seeking an explicit exception for these standards for the ndarray package.

Conflicts with LSST Coding Standards

  • ndarray is a header-only library that makes extensive use of template metaprogramming, which makes it impossible to provide explicit instantiations of all template classes.
  • Certain names do not follow LSST naming standards because they are designed to make classes conform to certain STL concepts ("size()" is present as well as "getSize()") or to mirror STL/boost interfaces ("make_index" mirrors "make_pair", etc).
  • The library contains many classes with very similar APIs and implementations which nevertheless cannot be completely merged. I have used preprocessor macros to put the shared code in a single place and make it easier to maintain. I have taken care to ensure that doxygen is aware of this usage so the documentation it produces is accurate and provides the reference information the code itself lacks (because of the macros), but this may require running doxygen with slightly non-standard options.

Applications to Multifit

Multifit makes use of a lot of reinterpretation and changes of dimensionality and shape in its arrays and subarrays. The underlying data is indeed just an image, but we often treat it as a vector. Or, rather, it's a collection of differently-sized images we will flatten into one giant vector. And while that's easy enough to support without a complete multi-dimensional array library, we also will deal quite a bit with matrices in which the rows or columns will correspond to this giant flattened pixel vector, and the other dimension will correspond to something else entirely (such as the derivative of a model with respect to some set of parameters).

Because we'll be allocating those as giant vectors/matrices and setting their coefficients by extracting subarrays, reshaping them to 2 or 3 dimensions, and dealing with them as (y,x) images or (y,x,parameter) 3-tensors, a full multidimensional array library started to sound like a good idea.

The current afw::image API does not support our use case because it is impossible to construct an Image-like view into an arbitrary 3-dimensional block of memory; all memory buffers used by Image must currently be allocated in a single boost::gil::image. We will work around this to use Image when interfacing with other afw constructs, and while this is not a good long term solution, it seems best for DC3b.

Uses beyond Multifit

  • The ndarray types map very nicely to numpy arrays, and I have utility functions for converting between ndarray objects and PyObject* that can be used to make nice swig typemaps (or whatever). I've found this particularly useful in testing, when you want to get some data generated in Python into C++ as simply as possible.
  • Shallow Eigen objects can be extracted from arrays and subarrays without copying, which makes it very easy to do optimized linear algebra in-place on general (and potentially large) arrays. We have recently seen in the diffim code that reinterpreting pixel operations as matrix operations to be handled by an optimized linear algebra library such as Eigen can provide a significant speedup with little programmer effort. ndarray is designed to provide just this sort of image/matrix duality.

Aditional Materials

Doxygen Documentation

Attachments