wiki:PersistenceHowTo
Last modified 11 years ago Last modified on 12/19/2007 01:11:10 PM

How to Use the Persistence Framework

from: Middleware Interface -> Persistence Framework and from: DC2 Management -> Persistence Framework.

Developers of persistable classes need to read the sections "Persistable Classes", "Formatters", and "Storage".

Developers who need to persist and retrieve already-written persistable classes need to read the sections on "Storage", "Persistence", and "Retrieval".

A sample usage of the persistence framework may be viewed in mwi/tests/Persistence_1.cc

Persistable Classes

In order to enable persistence for a class using this persistence framework, the class must inherit publicly from lsst::mwi::persistence::Persistable.

A separate formatter class must be written to handle the persistence and retrieval of the class's state during persistence operations (see next section).

The persistable class' declaration must include the macro invocation LSST_PERSIST_FORMATTER(formatter-class-name). It is intended that this macro be used in the class's private section. The formatter-class-name argument to the above macro is the name of the formatter for the persistable class. This macro generates a number of important declarations that allow the class to plug and play in the persistence framework, including declaring the formatter to be a friend of the persistable class, thus giving it full access to the private state of the class.

#include "lsst/mwi/persistence/Persistable.h"

namespace lsst {
namespace example {

// Forward declaration
class MyFormatter;

class MyPersistable : public Persistable {
public:
...
private:
    LSST_PERSIST_FORMATTER(MyFormatter);
    // MyFormatter is a friend of MyPersistable
...
};

}} // namespace lsst::examplecleaned up  

A template class can be made persistable in a similar fashion. In this case, the formatter must also be a template with the same arguments. The template formatter forward-declaration, and the formatter-class-name argument to the LSST_PERSIST_FORMATTER macro invocation must include the template arguments:

#include "lsst/mwi/persistence/Persistable.h"

namespace lsst {
namespace example {

// Forward declaration
template <typename TemplArg> class MyFormatter;

template <typename TemplArg> class MyPersistable : public Persistable {
public:
...
private:
    LSST_PERSIST_FORMATTER(MyFormatter<TemplArg>);
...
};

}} // namespace lsst::example

Formatters

All persistable class formatters must derive publicly from lsst::mwi::persistence::Formatter.

Formatters are responsible for performing all operations necessary to store and retrieve the state of a particular persistable class instance. Each specific formatter is explicitly designed to work with one or more types of Storage classes.

Each formatter must meet all of the requirements in the following subsections.

Initialization Methods and Resources

1) It must define the following private static factory function to create and return an instance of itself as a base Formatter object:

static Formatter::Ptr createInstance(lsst::mwi::policy::Policy::Ptr policy)

2) It must declare a static private instance of the class lsst::mwi::persistence::FormatterRegistration:

static FormatterRegistration registration

In the formatter's implementation, this instance must be initialized with the name and typeid() of formatter's Persistable class, and the formatter's factory function:

/** Register this Formatter subclass through a static instance of
 * FormatterRegistration.
 */
lsst::mwi::persistence::FormatterRegistration
    MyFormatter::registration("MyPersistable", typeid(MyPersistable), createInstance);

3) Each constructor of the Formatter subclass must at a minimum perform the following initialization, needed for proper tracking by the Citizen class:

Formatter(typeid(*this))

I/O Methods

4) It must define a public write method for writing a Persistable to a Storage:

void write(Persistable const* persistable, Storage::Ptr storage, lsst::mwi::data::DataProperty::PtrType additionalData)

5) It must define a public read method for reading a Persistable from a Storage and returning that object as an instance of a Persistable base:

Persistable::Ptr read(Storage::Ptr storage, lsst::mwi::data::DataProperty::PtrType additionalData)

6) It must define a public method update for updating an existing Persistable object with state read from a Storage. This method differs from a 'read' method only in that it modifies an existing, partially retrieved, Persistable object instance instead of creating a new one:

void update(Persistable::Ptr persistable, Storage::Ptr storage, lsst::mwi::data::DataProperty::PtrType additionalData)

The additionalData parameter in each of these methods is used to pass information about the execution context of the system. This information may be used to generate table names, file pathnames, or database column values as appropriate, usually in conjunction with a template from a Policy. There are currently five documented keys that may be placed in this DataProperty:

  • visitId: a long long (or int64_t) that identifies the visit being processed by this pipeline.
  • sliceId: an int that identifies the data-parallel slice of a pipeline that this code is running in.
  • universeSize: an int that gives the number of slices in the pipeline plus one for the pipeline master.
  • itemName: a std::string that gives the name of the item being persisted. For example, this may be the label given to the item on a pipeline clipboard.
  • StorageLocation: a DataProperty node.

The StorageLocation DataProperty may contain keys that are the names of Storage subclasses (see below). The value of each such key is a std::string that contains the LogicalLocation string used to persist the current data item to that Storage. This is intended to allow the creation of linkages between storage types; in particular, it allows file pathnames to be stored in a database.

The Formatter subclass is declared a friend of the Persistable class by the LSST_PERSIST_FORMATTER macro (see above), so it can get and set member variables directly in these methods, if needed.

#include "lsst/mwi/persistence/Formatter.h"

namespace lsst {
namespace example {

// Formatters manipulate many objects within the lsst::mwi::persistence namspace.
// This using declaration will cut down on the verbosity of this formatter's code.

using namespace lsst::mwi::persistence; // Place this using declaration within the scope of the Formatter's
                                        // namespace 

class MyFormatter : public Formatter {
public:
    virtual ~MyFormatter(void);

    virtual void write(Persistable const* persistable,
                       Storage::Ptr storage,
                       DataProperty::PtrType additionalData);

    virtual Persistable::Ptr read(Storage::Ptr storage,
                                  DataProperty::PtrType additionalData);

    virtual void update(Persistable::Ptr persistable,
                        Storage::Ptr storage,
                        DataProperty::PtrType additionalData);

    ... etc ...

private:
    MyFormatter(lsst::mwi::policy::Policy::Ptr policy) : Formatter(typeid(*this)) {};

    static Formatter::Ptr createInstance(lsst::mwi::policy::Policy::Ptr policy);

    static FormatterRegistration registration;
};

}} // namespace lsst::example

Boost Serialization Support

If the Formatter is to be used with Boost serialization, the Formatter subclass must #include "lsst/mwi/persistence/FormatterImpl.h". It must also expose a delegateSerialize method that is used by the target persistable class when it is called by the boost serialization library. One of two approaches may be used:

  • A single method templated on the boost archive type
#include "lsst/mwi/persistence/FormatterImpl.h"

class MyFormatter : public Formatter {
public:
    ... etc ...

    template <class Archive> void delegateSerialize(
            Archive& archive, unsigned int const version, Persistable* persistable)

    ... etc ...

};
  • A set of similar member functions for each supported archive type.
class MyFormatter : public Formatter {
public:
    ... etc ...

    static void delegateSerialize(
        boost::archive::text_oarchive& ar, unsigned int const version, Persistable* persistable);

    static void delegateSerialize(
        boost::archive::text_iarchive& ar,unsigned int const version, Persistable* persistable);
        
    static void delegateSerialize(
        boost::archive::xml_oarchive& ar, unsigned int const version, Persistable* persistable);

    static void delegateSerialize(
        boost::archive::xml_iarchive& ar, unsigned int const version, Persistable* persistable);

    ... etc ...

};

Within these functions, operator&() must be called with boost::serialization::base_object<Persistable>(*this) before any data members are serialized. (The same holds for any other base classes of the Persistable, which presumably are Persistable themselves.)

If the persistable class is to be stored in a DataProperty or otherwise used through a base class pointer, and it is also to be persisted using Boost serialization, there is one additional requirement on the Formatter implementation (.cc) file. This file must #include <boost/serialization/export.hpp> after including all needed Storage subclass headers, and, following that, it must invoke the macro BOOST_CLASS_EXPORT(classname) outside any namespace. Since this is outside a namespace, the classname argument, which names the Persistable class, must be fully qualified with all its namespaces. (This also ensures that there will be no collisions of similarly-named classes in the Boost serialization file.)

Storage class support

Formatters must be sensitive to the type of Storage class they are used with, since the steps required to persist and restore an object's state varies with each specific storage type (e.g. XML I/O vs. database I/O).

The formatter must check and validate the type of the Storage object instance it is given. Also, if the formatter is designed to work with more than one type of storage, then the read, write and update methods must discriminate between those Storage types. This can be implemented as a series of conditions on the typeid() of the Storage, with each clause containing Storage-specific code to write to/read from that Storage.

The details of using each specific Storage subclass are elaborated below.

void MyFormatter::write(Persistable const* persistable,
    Storage::Ptr storage, DataProperty::PtrType additionalData) {
    MyPersistableObject const* mpop = dynamic_cast<MyPersistableObject const*>(persistable);
    if (mpop == 0) {
        throw lsst::mwi::exceptions::Runtime("Tried to persist non-MyPersistableObject with MyFormatter");
    }
    if (typeid(*storage) == typeid(BoostStorage)) {
        ... etc ...
        return;
    }
    else if (typeid(*storage) == typeid(XmlStorage)) {
        ... etc ...
        return;
    }
    throw lsst::mwi::exceptions::Runtime("Unsupported storage type in MyFormatter");
}

Persistable::Ptr MyFormatter::read(
    Storage::Ptr storage, DataProperty::PtrType additionalData) {
    /* Create a new object */
    if (typeid(*storage) == typeid(BoostStorage)) {
        ... etc ...
        return /* new object */;
    }
    else if (typeid(*storage) == typeid(XmlStorage)) {
        ... etc ...
        return /* new object */;
    }
    throw lsst::mwi::exceptions::Runtime("Unsupported storage type in MyFormatter");
}

The documentation for the Formatter subclass should specify which Storage subclasses it supports.

Configuration by Policy

A Policy is passed to the factory function that creates instances of the Formatter subclass. This Policy can be used to configure the formatter in any way desired by the formatter author.

This Policy is itself a sub-policy of the overall Persistence Policy that is passed to Persistence::getPersistence() (see below). Within the Persistence Policy, it is named "Formatter.<persistable name>". For example, the Policy to be used when creating all instances of the DataPropertyFormatter is named "Formatter.DataProperty".

Other Methods

Formatter subclasses may implement any other methods desired or necessary for use by other formatters, particularly those of classes that contain the persistable class for which the formatter is responsible.

As an example, let's say that we have a class representing a collection of sources. The collection's formatter, when passed a DbStorage, would like to persist the entire collection to a single database table of its choosing, perhaps a temporary table created just for this visit. But the only way that the collection can persist each source is to call the source's formatter, as only that formatter has friend access to the source class. The source's formatter, when passed a DbStorage, would by default write to a table that it has selected.

To resolve this, we implement an additional method for the source's formatter that writes to a table that is passed in as an argument; the collection's formatter can then use this method to efficiently persist each source within the collection.

The collection's formatter obtains the source's formatter by calling the Formatter static method lookupFormatter().

Storage

A limited set of Storage classes have been implemented for the persistence framework. It should be rare that additional Storage class subtypes will be required.

Each Storage type takes a LogicalLocation to give the destination for persistence or the source for retrieval. The LogicalLocation is currently implemented as a simple string. The meaning of this string is documented below with each Storage type.

BoostStorage

This Storage subclass provides the ability to persist and retrieve from Boost serialization files.

  • LogicalLocation: pathname to the desired file.
  • getOArchive(): get a reference to a boost::archive::text_oarchive that can be used with operator&() to persist data.
  • getIArchive(): get a reference to a boost::archive::text_iarchive that can be used with operator&() to retrieve data.

DbStorage

This Storage subclass provides the ability to persist and retrieve from database tables.

  • LogicalLocation: URL connection string for the database. dbtype://host:port/dbname, where dbtype is mysql, host is the hostname or IP of the database server machine, port is the TCP port the database server listens on, and dbname is the name of the database to be accessed. Authentication credentials (username and password) are specified by other means: a file with name specified by Policy to the DbAuth class, the fallback file /tmp/lsst.db.auth, or the environment variable LSST_DB_AUTH. The credentials format is simply username:password.

The following methods are to be used for persisting data:

// Specify the name of the table to insert into (INSERT INTO clause).
// Must be called before the following three methods.
virtual void setTableForInsert(std::string const& tableName)
// Specify a value for a column (SET clause).
template <typename T> void setColumn(std::string const& columnName, T const& value);
// Specify that a column is NULL (SET clause).
void setColumnToNull(std::string const& columnName);
// Perform the insert.
virtual void insertRow(void);

Commit occurs under the control of the persistence framework at the end of the persist() call (see below).

The following methods are to be used for retrieving data. The first group enables specification of a query:

// Specify the table to be queried (FROM clause).
// Either this or setTableListForQuery() should be called, and either
// of them before all other methods in this group.
void setTableForQuery(std::string const& tableName);
// Specify a list of tables to be joined in the query (FROM clause).
// Either this or setTableForQuery() should be called, and either
// of them before all other methods in this group.
void setTableListForQuery(std::vector<std::string> const& tableNameList);
// Specify an output column (order is significant) (SELECT clause).
// Use either this or outParam() but not both.
void outColumn(std::string const& columnName);
// Specify an output column and bind the result to a location (SELECT clause).
// Use either this or outColumn() but not both.
template <typename T> void outParam(std::string const& columnName, T* location);
// Bind a value to a parameter to be used in the condition (WHERE clause).
// This value replaces :paramName in the string passed to setQueryWhere().
template <typename T> void condParam(std::string const& paramName, T const& value);
// Define the condition as a SQL string (WHERE clause).
void setQueryWhere(std::string const& whereClause);
// Request that the query output be sorted by an SQL expression.
// Multiple expressions may be specified, in order, separated by commas.
void orderBy(std::string const& expression);
// Request that the query output be grouped by an SQL expression.
void groupBy(std::string const& expression);

The second group enables retrieval of the data:

// Execute the query.  Must occur after all of the above group;
// must precede any of the other methods in this group.
void query(void);
// Get the next row (including the first row); returns false if no more.
// After this returns, any variables bound with outParam() may be referenced.
bool next(void);
// Get a column's value by position (the order of outColumn() calls, starting with zero).
template <typename T> T const& getColumnByPos(int pos);
// Returns true if there was no value for the specified column.
bool columnIsNull(int pos);
// When next() returns false and getColumnByPos() will no longer be called, call this.
void finishQuery(void);

DbTsvStorage

This Storage subclass is a subclass of DbStorage. It provides the exact same interface, but during persistence it writes row data to a tab-separated value file, one row per line, and then commits this data by using LOAD DATA LOCAL INFILE to load it into the database. It only works with MySQL databases.

Note that several of the functions of DbStorage and DbTsvStorage are templates, not virtual functions. This means that the DbTsvStorage versions must be called via a DbTsvStorage pointer, not a DbStorage pointer.

FitsStorage

This Storage subclass is intended to enable persistence and retrieval of FITS files. The current implementation provides minimal information; future implementations might offer more FITS-specific methods.

  • LogicalLocation: pathname to the desired file, optionally followed for retrieval only by a # and the Header/Data? Unit (HDU) number (0 indicates the primary).
  • getPath(): get the pathname of the FITS file.
  • getHdu(): get the number of the HDU to retrieve.

XmlStorage

This Storage subclass provides the ability to persist and retrieve from Boost serialization XML files. Note that these are not fully general XML files, so this is primarily for example purposes rather than expected to be part of the final production framework. Also note that the Boost XML archives require name/value pairs, as generated by make_nvp(), not just data elements. See the Boost documentation.

  • LogicalLocation: pathname to the desired file.
  • getOArchive(): get a reference to a boost::archive::xml_oarchive that can be used with operator&() and make_nvp() to persist data.
  • getIArchive(): get a reference to a boost::archive::xml_iarchive that can be used with operator&() and make_nvp() to retrieve data.

Persistence

To persist a Persistable class, two things must be specified: how and where the Persistable is to be persisted. The how is specified by creating instances of Storage subclasses. The where is specified by creating corresponding LogicalLocation instances.

Typically, the list of names of the Storage subclasses will come from a Policy. Also, the strings used to create the LogicalLocations will also come from a Policy. Finally, a Policy may also be used to configure the Persistence object.

There are four steps required for persisting an instance of a Persistable class:

  • Get a Persistence instance using Persistence::getPersistence(Policy::Ptr policy).
  • For each type of storage, create a corresponding LogicalLocation using LogicalLocation(std::string const& location).
  • Create a list of Storage subclasses using getPersistStorage(std::string storageType, LogicalLocation const& location) on the Persistence instance.
  • Call persist(Persistable const* persistable, Storage::List const& storageList, lsst::mwi::data::DataProperty::PtrType additionalData) on the Persistence instance.

Retrieval

Retrieval is a mirror image of persistence, with two exceptions: getRetrieveStorage() should be used instead of getPersistStorage(), and retrieve() is called instead of persist().