wiki:dbQueryResults
Last modified 10 years ago Last modified on 05/29/2009 02:18:39 PM

Query Results

Given that we can build a database system that will support analysis queries, data production queries, and others, at LSST's petabyte scale, where do the results of these queries go? How will they be used?

This page collects the most current ideas regarding query results.


Topics

Usage

  • What sort of processing happens on results?

Will they be packed away to an individual's workstation to be massaged and analyzed? Will they be directly used and injected into future queries? Will they be merged with other LSST databases or external, non-LSST databases?

Format and Layout

  • Are results partitioned?

Do they come in one big glob, or are they provided in pieces. We expect the query processing to be parallelized, so the results will naturally be produced in partitioned (arbitrarily?) pieces.

Placement

  • Where do results get placed?

This affects the I/O and CPU load on the database infrastructure, as well as the storage footprint.

There are three basic strategies:

  1. Stream back to user immediately
  1. Store in system space
  1. Store in user space (per-user)

We want to allow the injection of non-results data. e.g., Alice has some data she produced on her own that she wants to compare/analyze with LSST data, so she wants to upload it so she can run queries that reference both LSST's and her data. What if Alice wants to share her data with Bob? Do we provide a user "public" space, or do we provide some interface for manipulating an ACL?

User data will be stored temporarily up to some pre-defined or pre-negotiated time limit, after which they will be purged unless some there is some intervention or the data is proposed for promotion to L3. (ref: docushare 5438, 7396)