Changes between Version 3 and Version 4 of DC3bDbIngest


Ignore:
Timestamp:
03/26/2010 12:34:09 PM (9 years ago)
Author:
jbecla
Comment:

change naming from cache to scratch

Legend:

Unmodified
Added
Removed
Modified
  • DC3bDbIngest

    v3 v4  
    1111If we determine it is necessary, in DC4 we will introduce post-processing, e.g., to fine-tune partitioning or remap objectIds or sourceIds. 
    1212 
    13 In some cases pipelines expect to (a) read data from database, and (b) update some tables. For these reasons, there will be a temporary cache in between pipelines and the final database. This cache will consists of 2 parts: 
     13In some cases pipelines expect to (a) read data from database, and (b) update some tables. For these reasons, there will be a temporary ''scratch space'' in between pipelines and the final database. This scratch space will consists of 2 parts: 
    1414 
    1515 1. a database, likely centrally located but potentially per-node, which will contain: 
     
    1818 2. TSV files, which will contain all non-updatable data. These TSV files will be processed by the !DbIngest pipeline, which will use the partitioner to re-partition the TSV files and the loader to load them into appropriate tables.  This pipeline may run at the same time as other pipelines in the Data Release Production or after the others.  The partitioner and loader will be configurable by their respective stage policies. 
    1919 
    20 We expect the temporary database to be relatively small, e.g., we expect we won't need to cache the entire Object or Source or !ForcedSource catalog - we assume pipelines will work with individual tiles, and caching data for "current" tile or a small set of "current" tiles will be sufficient. 
     20We expect the scratch database to be relatively small, e.g., we expect we won't need to keep the entire Object or Source or !ForcedSource catalog - we assume pipelines will work with individual tiles, and keeping data for "current" tile or a small set of "current" tiles will be sufficient. 
    2121 
    22 Note that we do not plan to partition moving objects in DC3b. This is mostly because of small size  (6 million rows expected in production). We might need to partition !ForcedSourceForMovingObject (which will trigger partitioning for the !MovingObject table) after DC3b due to its potentially large size.  
     22Note that we do not plan to partition moving objects in DC3b. This is mostly because of small size (6 million rows expected in production). We might need to partition !ForcedSourceForMovingObject (which will trigger partitioning for the !MovingObject table) after DC3b due to its potentially large size.  
    2323 
    2424