wiki:DC3bUserAccess
Last modified 9 years ago Last modified on 03/01/2010 11:25:09 AM

DC3b User Access

This page defines and describes the different ways in which collaborators can access DC3b data and information.

Special Note: This effort is focused upon that functionality which is required for DC3b. Resource constraints, both labor and hardware, will not permit us to go beyond that at this time. Any discussion of operational system requirements is out-of-scope.

Science Use Cases

This section contains a few potential science-driven examples just to motivate the discussion, not as a comprehensive list of science drivers.

  • If a moving object is present in the DC3b MO catalog based on detections in multiple images, one might ask if it was detected in all images where it should have been found, based upon the inferred orbit. This would require being able to select DC3b images based on exposure epoch, the intersection of the object coordinates with image footprints, and (possibly) filter.
    • I think, even if the sci collab users don't do this, the MOPS team will.

However, instead of selecting images, I think being able to take the orbit and then find which exposures the object should have been visible in, and then looking at the DIAsource catalog to see if the object was detected would be what we'd want to do. (and then presumably being able to see if that DIAsource became linked to a tracklet and if it then also got linked into a track).

This requires: orbit -> ephemeride generation & cross-matching that with exposures, and then for each exposure it should have been visible within, predicting its x/y and/or ra/dec position and searching for a DIAsource at that location.

To get from orbits -> ephemerides for all exposures could be a little tricky. Does delivering this piece of software belong to the MOPS team, the database group, or SDQA? We've done things which are building towards this in both nightMOPS and the simsCatalog generation, but don't have a complete solution for this particular application in hand.

  • A user might want to compare either a single, calibrated image or a deep stack of CFHT-LS data as produced by the DC3b production to that produced by the Terapix pipeline (or from their own private observations), and perhaps run their own photometry code on the images for comparison. This would require selecting DC3b images by area on the sky and by filter.
  • A user with a strong interest in a particular collection of objects (galaxy clusters, star-forming galaxies, galaxies selected on X-Ray properties, etc.) wants to know if their objects are contained within any DC3b image. Presumably if the object is identified in the DC3b source catalog then the answer is known. But if they are not present, is it because the object was not detected, or because it did not fall within the footprint of the survey? This may not require access to images per se, but would require the ability to compare source coordinates with image footprints.

Use Cases

Use Case: Search the Catalog

Actors: Small number of named users

Description: Run a database query.

Main Scenario:

  1. Using any standard MySQL client software
  2. Connect to MySQL server
  3. Write SQL / Run Query
  4. Review results

Variations:

  • Run on the primary database server
  • Run on a secondary (REDDnet) database server

Assumptions:

Use Case: Download An Image File

Actors: Anyone

Description: Download a specific image file to a local computer.

Scenario #1 (HTTP):

  1. Open a browser or any type of HTTP client (e.g. wget)
  2. Send HTTP GET request with the full pathname of the desired image file
  3. Receive HTTP response (the image file)

Scenario #2 (REDDnet):

  1. (details TBD)

Scenario #3 (Directly from MSS):

  1. scp or any grid tool (uberftp, etc.) for moving files to/from mss.ncsa.uiuc.edu

Assumptions:

  • The user already knows the fully qualified name of the image file
  • Only one image file can be download per HTTP request
  • Anyone (public anonymous) can download from the HTTP Data Server
  • Only a small number of named users will be granted direct access to mss

Use Case: Lookup Catalog Reference Information

Actors: Anyone

Description: Find technical information about the tables and columns in the database catalog.

Main Scenario:

  1. Use standard browser to access Database Schema web application at http://dev.lsstcorp.org/schema/

Assumptions:

  • The existing web app and infrastructure is sufficient for DC3b

Use Case: Image Cutout Service

Actors: Anyone

Description: Retrieve cutout from raw or science or difference image or deep or template coadd.

Main Scenario:

  1. Specify geometry through RA/decl plus radius, RA/decl bounding box, object id plus radius, object id plus RA/decl deltas, or sky pixel identifier set.
  2. Specify image to retrieve cutout from: explicit pathname, unique identifier, criteria specific enough to locate image. Only one image at a time is retrieved; no mosaic reassembly is performed.
  3. Image is retrieved, selected portion is extracted, result returned as image.

Assumptions:

Use Case: Bulk Upload Into Catalog

Actors: Anyone

Description: Uploading of external data sources by scientists for matching/querying against DC3b-processed CFHT-LS data.

Main Scenario:

Assumptions:

  • Standard MySQL utilities will be sufficient
  • Storage requirements/restrictions?

Use Case: Invariant Identifiers for Accessing Sets of Related Data

Actors: Developers, Testers

Description:

The idea is that a fixed, opaque string will uniquely and persistently identify a specific set of related data. The data is usually file-based data, but does not have to be. The identifier could consist of visitId, ccdDetectorId and/or raft number (for example).

This identifier is especially valuable when documenting bugs. This identifier can be specified on the trac ticket so others can reproduce the problem using exactly the same data.

This identifier can also be valuable when writing and running unit tests.

Main Scenario:

  1. A problem is found and a ticket is created indicating this unique identified for the associated data.
  2. Later, someone else at another site, using a different copy of the DC3b data, wants to reproduce the problem and fix the bug.
  3. This person plugs in the data identifier into the python script used to run the codes.
  4. The python program invokes the appropriate middleware-layer objects to access the exact same data. Note that this data could have been read from a different repository than prior runs.

Assumptions:

  • Implemented as a set of python interfaces and classes

Requirements

Technical Requirements

Design and Implementation

DC3b Computing Environment

Catalog (Primary Database Server)

  • mysql clients connect directly to server on lsst10
  • port 3306
  • user accounts must be created/managed on lsst10

Catalog (Secondary Database Server at REDDnet Depot)

  • primary database exports to mss on a periodic basis
  • REDDnet replication of exported file(s)
  • periodic (cron) import into depot-based secondary database
  • secondary database is read-only
  • mysql client connect to secondary database at depot server
  • port 3306
  • additional details TBD
    • server management?
    • user account management?

Database Schema Browser

  • running on modo/lsstbook (via proxy from ds33)

Image Files (Direct Access to Mass Storage)

  • user account must be create/granted for HPC allocation
  • scp transfer to/from mss slow (~35MB/s)

Image Files (HTTP Data Server)

  • Modelled after existing service currently running at NCSA (for a non-LSST repository)

Image Files (Local REDDnet Depot)

  • details TBD

Attachments