Last modified 5 years ago Last modified on 10/08/2013 10:32:01 AM

Bosch's Refactoring Thoughts: Stuff We Shouldn't Do Now (But Which Still Bother Me)

Comments are welcome, but please add them in italics and prefix with your initials or username; insert them directly after the text you want to comment.

This page lists problems with packages that I think we shouldn't try to fix now, but should keep in mind as things that may need an overhaul before construction is complete.


I'm not convinced that pex_config will take us all the way to operations. It has a number of issues that I don't know how to solve:

  • I don't think we ever really solved the problem of how to represent history for pluggable configuration (ConfigurableField simply destroys history; RegistryField can be a pain to use).
  • After fighting really hard for an approach that I thought could allow us to avoid callbacks in the original design, I'm now convinced we need some sort of callbacks mechanism to help control the large number of related options we now have. Implementing callbacks is extremely tricky, though, which is why I pushed so hard to avoid them before.
  • We need an approach for cross-Task config validation, because we do have options in some Tasks that affect others. One should never set "matchBackgrounds=True" in assembleCoadd if "bgSubtracted=True" in makeCoaddTempExp, for instance.
  • Persistence of config objects still has some issues:
    • The requirement that the config class must be known by the user (rather than stored in the file) is an unfortunate restriction.
    • When we dump a config file in human-readable form, the history is lost.
  • The implementation for even the simplest kinds of Fields is horribly complex due to the need to preserve history and the clunkiness of Python's descriptors mechanism, and the most complex fields are virtually impossible to understand without a lot of dedicated effort. For these reasons, I'm no longer convinced that we want to let Python do our parsing.

All that said, I don't have any great ideas for how to solve these problems, and I think we're not yet done learning from the mistakes we made in the prototype we have. Most importantly, I think it's working well enough that it doesn't get in the way of other areas of the codebase.


I still hate Swig, and I'm proud to say that my LSST Trac Wiki page detailing the reasons for this is now the #1 Google hit if you google "Swig vs. Boost Python". But the options for replacing it still aren't good:

  • Cython is rapidly gaining market share, largely due to features we're not really interested in (writing Python-ish code that gets transformed into C), but it's a huge step back in feature-completeness relative to Swig for the features that we are interested in (wrapping C++). At best, it's a lateral move in its overall approach to wrapping code, so I don't have much hope for it solving our problems in the future.
  • ctypes does a very simple job well, but it doesn't have anything close to the features we need; it's at best one small component of a complete solution.
  • Boost.Python does most of what we need, but is missing some key elements (code generation from headers, STL support is not great). Most importantly, there's basically no active development there at all.

I have some small hope that someone will take Clang's parser and something more akin to Boost.Python's philosophical approach to wrapper generation to make a tool that solves all of our problems, but I'm not optimistic unless that person is me. I've started such a project, but I work on it about one weekend every four months, so it's really not going anywhere right now. While I still maintain that Boost.Python would be a better choice if we were starting from scratch now, with the Swig code we already have in mind, I'm forced to conclude that staying the course with Swig is still our best option.


There are a lot of ways I'm not happy about this creation of mine, and I hope I'll have a chance to address these someday. But like pex_config, I think it's largely working well enough for now, and we still have more to learn from this prototype about what we'll want to change. There are some smaller features I plan to add sooner (i.e. replace "slots" with true aliases in schemas). Features I think we'll want eventually include:

  • Add variable-length array fields (stored in a heap at the end of the table, as in FITS binary tables). If we stick with afw::table-based persistence (see my persistence framework page), this may move up in priority, as I think its in the persistence of general objects to tables that we need it most.
  • Resolve the confusion about the Table class, which probably should have been called Factory and should never have been made public to the user. Currently, it has some functionality that means I can't just make it private without refactoring a lot of other things. We could then consider renaming Catalog to Table, as I think that would be a more intuitive name for those who aren't already familiar with the current classes.
  • Try to find a way to avoid making the classes polymorphic - this causes huge headaches for containers, because C++ simply doesn't do containers of polymorphic objects well. I'm hoping I can find a way to use composition instead of inheritance to add features to Records.
  • Drop Boost.Variant, find another way to handle types. Using Boost.Variant limits the number of field types we can have to 20, which is very frustrating (yes, we are already at that limited, and I've already eliminated some field times to make room for others). The problem it solves for use - a schema is essentially a container of heterogenous objects without polymorphism - is a very difficult one to solve. But I may be able to make things work with a polymorphic approach without requiring virtual function calls on each field access, and if not C++11 may provide a way around the field type limit even if I have to stick with something like Boost.Variant.