wiki:db/queries/dataAnalysisQuestions
Last modified 6 years ago Last modified on 02/01/2013 12:46:40 AM

Questions about Data Analysis Queries

These questions are primarily for LSST Science Collaborations.

SUGGEST BRAND NEW QUERIES, assign priority for each (1 = lowest, 5 = highest)

  • (added March 17, 2009). Explain access patterns for the FaintSource table, provide example queries.
  • (added July 17, 2008). How frequently do you think you will need to work with full color images or postage stamps of full color images? We are planning to store coadded images, and generate full color images from them on the fly as needed. If we learn that color images will be needed frequently, we might be better off persisting them rather then regenerating on the fly.
  • (added July 17, 2008). How frequently will you use postage stamps? We are currently planning to store postage stamps associated with alerts, and generate on the fly other types of postage stamps, including postage stamps for objects, difference images sources and sources (detections). Depending on the demand, it maybe be more efficient to preserve postage stamps for some or all objects (and sources?)
  • hard queries on Source table - will you ever need to run queries listed below, or similar type queries. If so, please prioritize
    • "select time series for every object"
    • "select time series for every object classified as star (or something else)"
    • "select time series for every variable object classified as star (or something else)"
    • "give me light curves for all variable objects which were not classified as variable objects in production because they fall just under the threshold used during classification process"
    • "find me all objects that are varying with the same patterns as a given object, possibly at different time"
    • "find me all pairs of objects that have similar time series"
  • for the above queries, will you want photometry from deep detection (Source table) or from difference image (DIASource table)
  • "objects becoming two objects" - will you ever need to run queries listed below, or similar type queries
    • "select objects that ever split"
    • "select objects that never split"
    • "select objects that split than later become one object again"
    • please prioritize each query
  • Joining LSST main catalogs with other catalogs
    • which catalogs will you want to join with?
    • how big these catalogs are expected to be (1 GB? 1 TB? 1 PB)?
    • how often?
    • where do you think this should happen (LSST servers? VO?)
    • will you need "anti-cross matches" against other catalogs (find things that are in one but not in the other)

Quick reference - number of rows in billions for the largest tables

Table Name First Data Release Last Data Release
Source 150 2,800
Object 14 49
DIASource 5 10
VarObject 0.5 2