Changes between Version 4 and Version 5 of ProposalNDarray

09/25/2009 12:42:08 AM (10 years ago)



  • ProposalNDarray

    v4 v5  
    1313 * The library contains many classes with very similar APIs and implementations which nevertheless cannot be completely merged.  I have used preprocessor macros to put the shared code in a single place and make it easier to maintain.  I have taken care to ensure that doxygen is aware of this usage so the documentation it produces is accurate and provides the reference information the code itself lacks (because of the macros), but this may require running doxygen with slightly non-standard options. (No specific coding standard; however, preprocessor macros are widely acknowledged as something to be avoided when possible, and my code uses them extensively - so I expected someone would object to them unless I explained their use here). 
     15== Motivation == 
    16 == Applications to Multifit == 
     17Multifit has a strong need for a unified interface for 1, 2, and 3 dimensional arrays which either: 
     18 * share data (e.g. Numpy, {{{afw::Image}}}); ''or'' 
     19 * include both strong owning and non-owning variants, in which the owning version is convertible to the non-owning version (boost::gil::image and boost::gil::view). 
     20Existing LSST code seems to prefer the former, and since ndarray supports both, I'll assume that usage.  Right now, LSST has: 
     21 * Two main one-dimensional array classes in use ({{{std::vector<double>}}}, {{{Eigen::VectorXd}}}).  Neither of these has shared ownership, and the most common one, {{{std::vector}}}, does not support views (STL iterators, of course, can substitute for views in many cases, but often require templates). 
     22 * Two main two-dimensional array classes in use ({{{afw::Image}}}, {{{Eigen::MatrixXd}}}). 
     23 * No three-dimensional array classes.  In our case, 3D arrays are almost always stacks of images, and while {{{std::vector<afw::image::Image>}}} is a possible workaround sometimes, it is no more the appropriate replacement for a true strided 3D array than a vector-of-vectors is a suitable replacement for an image. 
    18 Multifit makes use of a lot of reinterpretation and changes of dimensionality and shape in its arrays and subarrays. The underlying data is indeed just an image, but we often treat it as a vector. Or, rather, it's a collection of differently-sized images we will flatten into one giant vector.  And while that's easy enough to support without a complete multi-dimensional array library, we also will deal quite a bit with matrices in which the rows or columns will correspond to this giant flattened pixel vector, and the other dimension will correspond to something else entirely (such as the derivative of a model with respect to some set of parameters). 
     25The situation is somewhat worse, however, because Eigen has different compile-time types for dealing with arrays that own their data, blocks of arrays that own their data, and arrays that reference external data ({{{Matrix}}}, {{{Block}}}, and {{{Map}}}, respectively).  This makes it necessary to use templates to support any operation on Eigen-based arrays that doesn't care about how they were allocated, which in turn makes it impossible to write such an operation as a virtual member function.  Meanwhile, {{{std::vector}}} has no support for views or shared data, while {{{afw::Image}}} only allows sharing between images, making it impossible to construct, for instance, an {{{Eigen::Map}}} that references data in an {{{afw::Image}}}, or to construct an {{{afw::Image}}} or {{{std::vector}}} view into a block of an {{{Eigen::VectorXd}}}. 
    20 Because we'll be allocating those as giant vectors/matrices and setting their coefficients by extracting subarrays, reshaping them to 2 or 3 dimensions, and dealing with them as (y,x) images or (y,x,parameter) 3-tensors, a full multidimensional array library started to sound like a good idea. 
     27Clearly none of these types should go away; they all have their specific uses.  And not all of the above cases are necessary.  However, algorithm code that operates on a simple 1-, 2-, or 3-dimensional strided array concept should be built around that bare concept, not on the details of how the memory that comprises that array was allocated or how its lifetime is managed, and all objects that can support that concept should somehow be adaptable to it. 
    22 The current afw::image API does not support our use case because it is impossible to construct an Image-like view into an arbitrary 3-dimensional block of memory; all memory buffers used by Image must currently be allocated in a single boost::gil::image.  We will work around this to use Image when interfacing with other afw constructs, and while this is not a good long term solution, it seems best for DC3b. 
     29{{{ndarray}}} is not the only solution to this problem, and right now it is not even a complete solution; modifications would be required to {{{afw::Image}}} to expose an {{{ndarray}}} view of an {{{afw::Image}}}, and full shared-owner interoperability with {{{std::vector<double>}}} or {{{Eigen}}} is impossible.  However, it is a partial solution that already exists, and considering the DC3b timescale we don't see any other alternative for multifit, where the need for shared multidimensional arrays is perhaps most acute. 
    24 == Uses beyond Multifit == 
     31For now, we can limit {{{ndarray}}} usage to the {{{multifit}}} package, and simply copy {{{afw::Image}}} objects into 2D {{{ndarray}}} objects on the boundary between multifit and other code.  In the future, we hope that either: 
     32 * {{{ndarray}}} will be used more widely throughout the project, and integrated closely with {{{afw::Image}}} so that both {{{ndarray}}} and {{{afw::Image}}} can share references to the same memory (this will require changes to the internals of {{{afw::Image}}}, but crucially it need not change the ownership semantics or other external behavior of {{{afw::Image}}}); ''or'' 
     33 * {{{ndarray}}} will be replaced in multifit by a custom set of 1-, 2-, and 3-dimensional array classes that provide the {{{ndarray}}} functionality needed by multifit and are similarly integrated with {{{afw::Image}}}. 
    26   * The ndarray types map very nicely to numpy arrays, and I have utility functions for converting between ndarray objects and !PyObject* that can be used to make nice swig typemaps (or whatever).  I've found this particularly useful in testing, when you want to get some data generated in Python into C++ as simply as possible. 
     35We intend to use {{{Eigen}}} mostly via {{{Map}}} objects which will be constructed from {{{ndarray}}}-owned data; this will allow us to benefit from optimized {{{Eigen}}} operations while avoiding constructing {{{ndarray}}} objects (or wanting to construct {{{afw::Image}}} objects) that reference memory that is not reference-counted. 
    28   * Shallow Eigen objects can be extracted from arrays and subarrays without copying, which makes it very easy to do optimized linear algebra in-place on general (and potentially large) arrays.  We have recently seen in the diffim code that reinterpreting pixel operations as matrix operations to be handled by an optimized linear algebra library such as Eigen can provide a significant speedup with little programmer effort.  ndarray is designed to provide just this sort of image/matrix duality. 
    30 == Aditional Materials == 
     37== Additional Materials == 
    3138[ Doxygen Documentation]