Ticket #415 (closed new functionality: fixed)

Opened 10 years ago

Last modified 7 years ago

Persisting classification information

Reported by: jbecla Owned by: Tim Axelrod
Priority: normal Milestone:
Component: database Keywords:
Cc: RobertLupton, SergeMonkewitz Blocked By:
Blocking: Project: LSST
Version Number:
How to repeat:

not applicable

Description

The information from the source classification table: http://dev.lsstcorp.org/htmldocs/SourceClassificationTable.pdf should probably be persisted. Need to better understand this usecase and determine if/how to persist it.

Attachments

SourceClassificationTable.pdf (24.7 KB) - added by smm 10 years ago.
attaching PDF which got lost during recent server woes

Change History

comment:1 Changed 10 years ago by jbecla

  • Milestone set to DC3a Apps DB Schema

comment:2 Changed 10 years ago by Tim Axelrod

  • Cc RobertLupton, SergeMonkewitz added
  • Owner changed from TimAxelrod to jbecla
  • Status changed from new to assigned

Here's the current use case text for "Process Exposure Pair (Visit)":


BASIC COURSE:

Register and add the two Difference Exposures E1, E2, producing a new Exposure E+.

Invoke Detect DIA Sources on E+ producing a DIA Source Collection, D+.

Iterate through D+:

  1. measure each DIASource in E1 and E2, producing DIA Sources d1 and d2.
  2. add the SourceId? of d1 and d2 to the Source Visit Table
  3. add d1 to DIA Source Collection D1 and d2 to DIA Source Collection D2

Persist D1, D2, and the Source Visit Table

Invoke "Associate DIA Sources to Astro Objects"


I propose that each row of the Source Visit Table simply link together the SourceId?'s of corresponding sources from the two Exposures that comprise a Visit: (diaSourceId1, diaSourceId2).

I also propose the logic of the Source Classification Table (deciding whether a Source pair comes from a cosmic ray, a fast mover, etc) be placed in the Association Pipeline, which will now be passed the Source Visit Table.

comment:3 Changed 10 years ago by jbecla

  • Owner changed from jbecla to Tim Axelrod
  • Status changed from assigned to needinfo

Source Visit Table --> DIASource Visit Table

Introducing this new table raises a question how to call DIASource table. It is not completely obvious from its name that it will contain DIASources corresponding to individual exposures. Perhaps we should call these tables "DIASourcePerVisit" and "DIASourcePerExposure"?

This also brings up a question what the Data Release pipeline will see: individual exposures or visits? Specifically from database point of view, do we need to introduce similar concept for Sources?

comment:4 Changed 10 years ago by jbecla

  • Owner changed from Tim Axelrod to TimAxelrod
  • Status changed from needinfo to assigned

comment:5 Changed 10 years ago by jbecla

  • Owner changed from TimAxelrod to jbecla
  • Priority changed from minor to normal
  • Summary changed from Persisting source classification table to Persisting classification information
  • Milestone changed from DC3a Apps DB Schema to DC3a MW DB

This was discussed in details at DataAccWG telecon 12/2/08. Current plan is to

  • push the logic into association pipeline (the classification logic should not be hardcoded - pluggable classifiers are probably for DC3b)
  • persist information about classification per DIASource (in some flags in the DIASource table)
  • persist information that allows finding pairs of DIASources measured on individual exposures from the same visit correspond to each other (as a new column in the DIASource table).
  • define a rule that allows to avoid processing the same pair of DIASources from the same visit (only of the DIASources from such pair should be processed)

comment:6 Changed 10 years ago by jbecla

  • Owner changed from jbecla to smm

I added needed structures to the schema (diaSource2Id and flagClassification). I guess it makes sense to reassign now to Serge.

comment:7 follow-up: ↓ 8 Changed 10 years ago by smm

  • Owner changed from smm to TimAxelrod
  • Status changed from assigned to needinfo

Tim, will both DiaSources? d1 and d2 in a pair have exactly the same positions?

If not, should I match against objects/mops predicions using the average of both positions? Alternatively, a pair of DiaSources? could be considered to spatially match an object if at least one member in the pair matches...

comment:8 in reply to: ↑ 7 Changed 10 years ago by Tim Axelrod

  • Owner changed from TimAxelrod to smm
  • Status changed from needinfo to assigned

Replying to smm:

Tim, will both DiaSources? d1 and d2 in a pair have exactly the same positions?

If not, should I match against objects/mops predicions using the average of both positions? Alternatively, a pair of DiaSources? could be considered to spatially match an object if at least one member in the pair matches...

No, they will in general be slightly different, if only due to noise. I don't see why you have to consider matching a pair to the Object table, as opposed to the two members independently. The source classification logic mainly requires comparing the properties of d1 to d2, rather than referencing the matching Object. One exception is in deciding whether the matched object is variable.

Clearly it is possible to come up with pathological cases: a rapidly moving object passes in front of a variable star, so that D1 would not get passed to MOPS, but D2 would - etc. I suspect we can punt on those, at least for DC3.

Changed 10 years ago by smm

attaching PDF which got lost during recent server woes

comment:9 follow-up: ↓ 10 Changed 10 years ago by smm

  • Owner changed from smm to TimAxelrod
  • Status changed from assigned to needinfo

I was considering them as pairs for efficiency reasons and because the classification table seems to operate on pairs of difference sources: e.g. the "Present in both visits" and "Shape differs in two visits" rows. (Aside: by "two visits", I assume you mean the two exposures in a single visit). So I guess I was assuming that association pipeline actions would apply to pairs of difference sources - for example, aren't there some cases in which you'd want to send out a single alert for a pair of difference sources?

Also, can you clarify under what circumstances AP should create new objects?

Based on discussion so far, let me sum up my understanding of what needs to be produced:

We want some way to specify a set of functions that map to true/false. These functions may either operate on the attributes of both difference sources in a pair, or consider only a single difference source at a time. Evaluating the set of functions for a particular difference source and its sibling yields a classification. This classification is then used to look up some way of identifying which action(s) the association pipeline should take (e.g. a decision tree). The decision on actions to take could involve looking at the matches of a difference source.

For DC3a, hardcoding the classification functions and decision trees in C++ is acceptable. However, we should think about ways to allow the function and decision tree definitions to be easily configurable. One way to accomplish this goal is to allow the AP policy to specify a list of Python functors that correspond to classification functions, decision trees, and actions. Another more complicated alternative would be to dynamically generate Python code (or even C++ code, if speed is an issue) from a DSL.

Does this sound about right?

comment:10 in reply to: ↑ 9 Changed 10 years ago by Tim Axelrod

  • Owner changed from TimAxelrod to Tim Axelrod
  • Status changed from needinfo to assigned

Replying to smm:

I was considering them as pairs for efficiency reasons and because the classification table seems to operate on pairs of difference sources: e.g. the "Present in both visits" and "Shape differs in two visits" rows. (Aside: by "two visits", I assume you mean the two exposures in a single visit).

Yes, certainly right.

So I guess I was assuming that association pipeline actions would apply to pairs of difference sources - for example, aren't there some cases in which you'd want to send out a single alert for a pair of difference sources?

Yes, alerts are based on pairs (=Visit), not single exposures.

Also, can you clarify under what circumstances AP should create new objects?

AP should not create new objects for Cosmic Rays or Fast Movers. Other cases should result in a new object if no existing object matches.

Based on discussion so far, let me sum up my understanding of what needs to be produced:

We want some way to specify a set of functions that map to true/false. These functions may either operate on the attributes of both difference sources in a pair, or consider only a single difference source at a time. Evaluating the set of functions for a particular difference source and its sibling yields a classification. This classification is then used to look up some way of identifying which action(s) the association pipeline should take (e.g. a decision tree). The decision on actions to take could involve looking at the matches of a difference source.

For DC3a, hardcoding the classification functions and decision trees in C++ is acceptable. However, we should think about ways to allow the function and decision tree definitions to be easily configurable. One way to accomplish this goal is to allow the AP policy to specify a list of Python functors that correspond to classification functions, decision trees, and actions. Another more complicated alternative would be to dynamically generate Python code (or even C++ code, if speed is an issue) from a DSL.

Does this sound about right?

comment:11 Changed 10 years ago by jbecla

  • Status changed from assigned to closed
  • Resolution set to fixed

comment:12 Changed 7 years ago by robyn

  • Milestone DC3a MW DB deleted

Milestone DC3a MW DB deleted

Note: See TracTickets for help on using tickets.