wiki:TimAxelrod
Last modified 13 years ago Last modified on 06/01/2007 10:00:43 PM

C++ Topics


Polymorphism

I spent more time than I care to admit understanding the following situation (simplified to its bare bones):

class Base {
public:
    Base() {};
    virtual void foo() {
        std::cout << "This is Base's foo"  << std::endl;
    }
    virtual ~Base() {};
}

class Derived : public Base {
public:
    Derived() {};
    void foo() {
        std::cout << "This is Derived's foo" << std::endl;
    }
}

class UserOfBase {
    UserOfBase() {}
    void doSomething(Base b) {
        b.foo();
    }
}
 
int main() {
   Derived bar;
   UserOfBase fool;
   fool.doSomething(bar);
}   

I expected to see "This is Derived's foo", and I bet you would too! But, as you will find if you use the handy <typeinfo> package to print out the type of the object that doSomething sees, it is always a Base, whether you call it with a Base or a Derived. What happened to the polymorphism implied by the "virtual void foo()" declaration in Base's definition??

I eventually read and understood the following sentence from Stroustrup in Section 15.4.1.1: "To get polymorphic behavior, an object must be manipulated through a pointer or reference"..

So, if one changes the definition of UserOfBase? to be:

class UserOfBase {
    UserOfBase() {}
    void doSomething(Base& b) {
        b.foo();
    }
}

one gets the behavior one expects. It makes perfect sense - in retrospect - and the implementation of the MaskedImage class is now about two days behind schedule!

References and smart pointers

We are using boost::shared_ptr (http://boost.org) extensively in our software. One of their strengths is that they count the number of references to an object, and delete it when the last reference to it is deleted (often when the scope it is used in disappears). If used properly, this makes for code which is leak free in the face of exception handling, and requires little attention from the programmer. I found the following situation surprising, though I now understand it:

boost::shared_ptr<foo> fooptr(new foo);
// fooptr.use_count() returns 1

boost::shared_ptr<foo> fooptr2 = fooptr;
// fooptr.use_count() returns 2

foo& foo3 = *fooptr;
// fooptr.use_count() returns 2

FW Topics

Needed capabilities

  1. function to set a value from a DataProperty? and a prioritized list of keyword aliases for that value. Example is setting the gain or the RA or the time from a random Fits header

Problem areas

  1. wcslib is used for the current implementation of the WCS class. It does not recognize TNX, and therefore gives junk for many wcs'ed images that come in from the outside

A Rant on Metadata

During our discussions of metadata on both the DataAccess? and Middleware telecons, it has become clear that we do not have a common definition for metadata, or for what we need to do with it. I agreed to make an attempt to craft such a definition, and some associated usage scenarios. This is a first step in that direction.

I found it instructive to take a look at the definitions of "metadata" that are in use, for example:

http://www.google.com/search?hl=en&defl=en&q=define:Metadata&sa=X&oi=glossary_definition&ct=title

There are many variants, but there are some clear commonalities:

  1. Metadata is "data that describes data"
  1. It is stored, at least logically, separately from the data that it describes.

An important thing to note is that whether something is data or metadata is context dependent. For example, consider the data which is contained in the header of a FITS file. This data is information about the pixel data that accompanies it: the dimensions of the image; when the image was taken; the parameters associated with the camera, etc. I think there is no argument that in the context of a FITS file, this is "metadata". But now consider what happens when this FITS file is ingested into the LSST image processing pipeline. One of the first actions is to take the information from the FITS header and use it to populate our Exposure tables in the database. I would argue that at this point we no longer have metadata but data.

To some extent, the more comprehensive our database schema, the less room there is for metadata. The only really clear example I can think of for metadata is the description (both machine and human readable) of the database schema itself. Another possible example is provenance information. It would certainly be reasonable to treat the provenance of a particular data item as metadata. On the other hand, our database schema contains tables for provenance, so I will claim that for us provenance is data rather than metadata.

This line of thinking leads me to question the need for "persisting metadata" at all. Instead, I think we face two other problems:

  1. Persisting transient data that is used by the pipelines. Persistence may be required to enable a checkpoint/restore capability for the pipelines. It may also be required for communication between pipeline components that inhabit different address spaces. But note that this persistence is itself likely to be transient: we do not intend for these data to be saved indefinitely, to be queried, or in fact to be used in any way outside of the pipeline context. I think this is the capability that we desire for the LSSTData classes.
  1. We need a way of permanently storing data that should fit in our database schema, but currently does not. Unless our schema is very flexible, this situation will inevitably arise during the construction and operation of the LSST. For example, the camera team may well come to realize that there are important operating parameters (voltages, temperatures, alignment of the planets), previously unrecognized, which affect performance and therefore must be associated with the images. Presumably, we will update our schema to incorporate this new information, but that will not happen instantaneously. Meanwhile, we will need to somehow persist this information in a way that is accessible to the pipelines, and to science application codes and queries. However we solve this problem, it seems to me that it must be done in the database context. Otherwise, writers of application codes will face a very difficult challenge in knowing where to find a particular kind of data. Worse, the answer would change over time.

I think it likely that the solutions to these two problems will be unrelated to one another - and to metadata.