Version 3 (modified by jbecla, 6 years ago) (diff)


Inevitable database schema will change over time. This document discusses how we are planning to address it. In essence:

  • avoid heavy changes that alter shape of large tables (eg by preallocating space)
  • rely on partitioning to scale and run necessary schema changes rapidly in parallel

Data release catalog

Each data release is independent, therefore database schema can easily change between data releases; we do not anticipate any non-trivial challenges here.

Each published data release will be frozen, e.g. to guarantee result reproducibility changes that might impact query results will not be allowed. Allowed changes include changes such as adding new columns, changing precision (without loosing information, eg float --> double), or adding/removing/resorting indexes.

New columns: we are planning to keep few extra unused columns of each common type (for example, 3 FLOATs, 3 DOUBLEs, 3 INTs, 3 BIGINTs) for each large table, and use one of them when a new column is needed. (renaming a column is a trivial and instantaneous operation). Speed of filling new columns should be comparable to speed of a full table.

Updating columns. Speed of updating values should be comparable to speed of a single shared scan.

Deletes. If we have to delete a column, instead of deleting it, which is expensive as it changes shape of a table, we will add it to the pool of "extra, unused columns" by renaming it.

Up-to-date catalog

We are planning to maintain two copies: one live used by alert production pipeline, and one for user queries. The two will be synchronized during daily. The up-to-date catalog will not contain the largest tables (Source and ForcedSource), the largest table will be the Object table, which opens more options, because most changes, even more complex ones can be done on that table in under 24 hours.

Notes about scalability

Non trivial schema changes on multi-billion row table tend to take extremely long time (measured in days). In our case, each large table will be partitioned into several thousands chunks, which dramatically improves speed and running the updates. Not only the updates can be done in parallel on multiple machines, but each partition will be small enough to be handled efficiently without running into any bottlenecks.

Notes about administration

All queries accessing the database will go through our interface, which intercepts and interprets the queries. This will allow us to put the entire system in special mode (user queries will be queued, administrative queries will continue to work). The administrative tools we are building will allow us to push the necessary updates to all nodes, and we could use the existing scheduling mechanisms to run the upgrade in the most efficient fashion. The admin tools will also allow to run necessary verifications (eg. whether the schema for all table chunks matches).