ORGANIZATION:
First, the basic structure of what has to be run where. You can run different parts of this at different times, but this overall structure has to be basically preserved in this particular order for every exposure:
| Code | CPU time | Output size | Owner |
| For each Opsim fields: | |||
| 1. Full-field instance catalog gen. | hours/field? | ~1 Gbyte/field | Andy/Rob/Jim/Simon |
| 2. Atmospheric parameter generator | seconds/field | small | Mallory |
| 3. Atmospheric screen generator | minutes/field | ~1 Gbyte/field | Garrett |
| 4. Cloud screen generator | minutes/field | ~30 Mbytes/field | Garrett |
| 5. Optics parameters | seconds/field | small | Nathan |
| Loop over the two exposures: | |||
| Loop over either chips or amplifiers for the following: | |||
| 6. Trim program | seconds/chip | ~30 Mbytes | Justin |
| 7. Raytrace | <1 hr/chip | 16 Mbyte/chip | John |
| 8. Cosmic ray adder | few seconds/chip | 16 Mbyte/chip | Mallory |
| 9. Background adder | 30 seconds/chip | 16 Mbyte/chip | Justin |
| Loop over amplifier (if you are looping over chips above): | |||
| 10. Electron-> ADC code | few seconds/amp | 1 Mbyte/amp | John |
So the current master script runs these in the order described above. It leaves open the question of whether or not the instance catalog are generated ahead of time or on the fly.
GRANULARITY:
Now the question of what part of the above gets divided to each processor. Since the raytrace still dominates the CPU it makes sense to send that to different processors in parallel. But at what granularity? It turns out with the way everything runs now the natural granularity is the chip level. You could consider granularities at the following levels:
1. PHOTON: every photon is separately generated and then collected
2. OBJECT: every object is separately generated and then images are put together
3. AMPLIFIER: every amplifier is separately generated
4. CHIP: every chip is separately generated and then split into amplifiers
5. FOCAL-PLANE: the whole focal plane is generated together and then split into amplifiers
1-3 have large overheads, whereas 5 is still in the hundreds of CPU hours. Consequently, 4 is ideal for now.
IMPLEMENTATION:
So we will have a master controlling script the does the loops described in the organization section, and then an individual script that runs codes 6-10 on the chip level that can be run in parallel.
