wiki:db/DC2/PartitioningTests

Version 1 (modified by smm, 12 years ago) (diff)

--

Description

These tests are aimed at determining how to partition the Object and DIASource tables to support efficient operation of the Association Pipeline (AP). The task of the AP is to take new DIASources produced by the Detection Pipeline (DP), and compare them with everything LSST knows about the sky at that point. This comparison will be used to generate Alerts that LSST and other observatories can use for followup observations, and is also used to bring LSSTs knowledge of the sky up to date.

The current AP design splits processing of a Field-of-View (FOV) into 3 phases. For context, here is a brief summary:

prepare
The prepare phase of the AP is in charge of loading information about the sky that falls within (or is in close proximity to) a FOV into memory. We will know the location of a FOV roughly 30 seconds in advance of actual observation, and this phase of the AP will start when this information becomes available. The Object, DIASource, and Alert tables contain the information we will actually be comparing new DIASources against. Of these, Object is the largest, DIASource starts out small but becomes more and more significant towards the end of a release cycle, and Alert is relatively trivial in size.
compare-and-update
The compare-and-update phase of the AP takes new DIASources and performs a distance based match against the contents of Object and DIASource. The results of the match are then used to retrieve historical Alerts for any matched Objects. The results of all these matches and joins are sent out to compute nodes for processing - these compute nodes decide which Objects must be changed (or possibly removed), which DIASources correspond to previously unknown Objects, and which of them are cause for sending out an Alert.
post-processing
The post-processing phase is responsible for making sure that changes to Object (inserts, updates, possibly deletes), DIASource (inserts only), and Alert (inserts only) are present on disk.

The database tests are currently focused on how to partition Object and DIASource such that the prepare phase is as fast as possible, and on how to perform the distance based cross-match of the compare-and-update phase. Tests of database updates, inserts, and on how quickly these can be stored to disk will follow.

The tests are currently being performed using the USNO-B catalog as a standin for the Object table. USNO-B contains 1045175763 objects, satsifying the DC2 requirement of simulating LSST operations at 10% scale.

Partitioning Approaches

Performance Results

Code

Attachments