Changes between Version 6 and Version 7 of db/SchemaEvolution


Ignore:
Timestamp:
03/05/2013 07:26:34 PM (6 years ago)
Author:
jbecla
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • db/SchemaEvolution

    v6 v7  
    1616'''Deletes'''. If we have to delete a column, instead of deleting it, (which is expensive as it changes shape of a table), we will add it to the pool of "extra, unused columns" by renaming it. 
    1717 
     18The unsed/hidden columns will be hidden through non-materialized views. It is likely that we will end up providing multiple views, e.g. each time we make a schema change, we'd expose the changes through a new view.  
     19 
     20We strongly encourage users to not rely on order of columns, or number of columns returns from "SELECT *" - obviously a query that expects x number of columns returned from "SELECT *" will fail if we add a new column. Introducing new view for each change will alleviate this problem. 
    1821 
    1922== Up-to-date Catalog == 
    2023 
    21 We are planning to maintain two copies: one live used by alert production pipeline, and one for user queries. The two will be synchronized daily during downtime (daytime). The up-to-date catalog will not contain the largest tables (Source and !ForcedSource), the largest table will be the Object table. Because it is significantly smaller, even more complex changes can be done fast, and will complete within the ~12h downtime window. 
     24We are planning to maintain two copies: one live used by alert production pipeline, and one for user queries. The two will be synchronized daily during downtime (daytime). The up-to-date catalog will not contain the largest tables (Source and !ForcedSource), the largest table will be the Object table. Because it is significantly smaller, even more complex changes can be done fast, and will complete within the ~12h downtime window. Typical scenario: 
     25 * night "X": db "A" used by alert production, db "B" used for user queries 
     26 * day "X": db "A" made available for user queries, db "B" taken offline and brought up to date (the observations from night "X" entered. Schema evolution on db "B" as needed (eg adding new column by renaming a dummy extra column from the reserved pool, etc) 
     27 * night "X+1": db "B" used by alert production, db "A" continues to be used by user queries 
     28 * day "X+1": db "B" made available for user queries, db "A" taken offline and brought up to date (schema evolution same as on db "A" the previous night, then the observations from night "X+1" entered) 
     29 
     30If we start running low on the extra columns in the reserved pool, we can refill it slowly during day time, a subset of chunks at a time. Since it is hidden, it it ok if some chunks have it and other don't. This is the worst case, assuming we don't have enough time during one day to add columns to all chunks. 
    2231 
    2332== Notes on Scalability ==