Last modified 11 years ago Last modified on 01/09/2008 01:29:30 PM

Computation in Database vs. Application

LSST Database

Why use a database?

Primary reasons:

  • Query. The ability to retrieve data in flexible ways is a key strength of a database.
  • Maintainability. Higher-level languages typically simplify maintenance, not least by reducing lines of code; SQL is much higher level than C++.
  • Transactions. Transaction commit and rollback implemented in the database help to ensure consistency of data.

Secondary reasons:

  • Storage management. Databases can usually make transparent use of multiple disks (on one machine) while also providing the capability for managing the storage resource through limits and quotas.
  • Replication and fault tolerance.
  • Multiuser support.

Applicability to Association Pipeline

Which of these are applicable to the Association Pipeline in LSST nightly Base Camp processing?


The query patterns required by alert generation are likely to be simple. For performance reasons, it is likely that all data will be preloaded, rather than obtained as needed, anyway. QA and operations may require more complex query access to data; this is currently unspecified. As a result, it is not clear that the database is needed for anything besides the Association Pipeline. The computations and queries done in the AP are relatively simple and are not expected to change over time.


Since the AP code is not expected to change much over time, maintainability is not an overwhelming consideration.


Storage management

A filesystem may provide adequate storage management for the datasets required. Quotas and limits are not likely to be needed.

Replication and fault tolerance

Since the entire pipeline is to be replicated for fault tolerance, database-internal methods are not strictly needed.

Multiuser support

The real-time nature of the system and the resulting performance constraints generally dictate against allowing multiple users of the data anyway.