wiki:SDQAMeetingJuly2009
Last modified 9 years ago Last modified on 01/15/2010 10:53:27 AM

LSST SDQA: IPAC group plus Tim Axelrod and Dick Shaw.

Held at IPAC, 20 July 2009

Vince's Notes

Ideally, want to be able to craft/support a query of something that was not anticipated about as easily as something that was.

For DC3b, tool design is secondary to architecture design & development.

We want not only hardwired metrics but the ability for a user to combine such QA information with quantities computed by user-generated code; could support creation of a new DB/sandbox; display tool would have access to sandbox DB, Science DB, Engineering DB, etc.; also, system could support loading information to the tool from a file; another useful feature would be ability to cut & paste from a file to the sandbox (Russ). Question of what we would want to persist remains open.

See the Mirage tool -- we probably will be unable to use it, but we should examine it for ideas on the features we'd like to see in ours.

Ask Dave Wittman for suggestions re metrics he will need that he doesn't already think/know will be supplied. Along these lines, Deborah waxed painterly: get members of user community to each supply their own small sets of dots; we then paint them onto a canvas and stand back to see how the bigger picture (the design) might come into focus and evolve. Cool analogy.

Example use case: a coadd. If we want information on all images contributing to determination of some property of some object, can we get that information?

Tim described recent use case re statistics per footprint on a diff image and a science image; none of the stats he needed are currently persisted; Becker wrote code to get/compute the required numbers, write them to a file, and plot them. And it was deemed a useful thing to do (i.e. it unearthed a problem), and it is the kind of thing we might want to do in operations. Raises question: is computation of such quantities (calculated on per footprint basis, here) the kind of thing we want to routinely persist or do instead just want the capability to do it when we need it? Deborah suggests we facilitate it in such a way that we are able to do this kind of thing but not that we do it routinely for everything we can think of. Tim is less concerned by how we compute something and more concerned with how and when information like this gets persisted. Basically, do we want to do try to do everything during processing or do we want to be able to do as Becker did, after the fact? Presumably have combination of the two. We would then want to support (after judging case-by-case) ability to migrate from sandbox DB to Sci SB, and to add user code (or subsume algorithm) to pipeline and do the calculations long-term and persist the results,

IMO, belaboring the point to exhaustion, could imagine qa subsystem design that captures everything and every possible use case in advance, design the schema, establish the sci DB, and that's the full qa system; or, could imagine none of above and instead have users grabbing metadata, writing code, producing results, and these make their way to the Sci DB and the private code makes its way to the pipelines, and that's our qa system. In real world, anticipate a blend of the two to the extent that resources permit. See diagram below.

For DC3b, Tim has the feeling we're currently much better off with user-written programs and user DB than we are with a display tool, i.e. we're better off in the sense that that's what we have right now. Action to Tim, before next week's videocon, to put out a strawman list, on per pipleine basis, of quality information that we need for DC3b (for alert production and for data release production).

Another example use case (Dick): compare results in DC3b from two or more production runs, i.e. from two or more databases. This is especially non-trivial if we can't immediately associate objects in the query sense across different runs (might not have same objectID).

SDQA ratings: whether or not to have them is open, but leaning towards not having them. Instead, use metadata thresholded via policy files (K-T's preference?). Metrics already provide ability to extend tables without changing schema structure; since this was main driver for having ratings we don't need them in addition to metrics.

Deborah's Notes

Dave Wittman’s email (IMO) argues for fluid combinations of “metrics” and easy “drill down”.

A Sandbox, possibly with a database to store calculated results.

3 geometries needed – celestial, focal plane, terrestrial/local.

Lynne’s MOPS use case: General mechanism for distances. General statistics/trending capabilities. Surface plots… how many occurrences of N at X,Y location.

Tim mentions the language “R” which has very powerful capabilities for tight integration. Tim wants python integration, i.e., a “decent python interface”.

DC3a analysis as use case: Andy Becker, for science image and diff image, look at statistics for footprint. Mean of the variance image, variance the difference image. MO, a general capability to facilitate, but not necessarily persist.

Facilitate collaborations ability to look at the data without huge overhead. For 3b.

Ability to compare across pipeline runs (diff DBs)

Dick's Architecture cartoon

Attachments