wiki:db/BackupRepl
Last modified 6 years ago Last modified on 02/01/2013 02:09:30 PM

Database Backup and Replication in DC3b

Issues with standard mysql replication:

  • typically we produce run, and later decide it is a "production" run". Hard to replicate unless we delay replicating, or replicate everything
  • doubling space requirements due to binary logs, if multi-slaves, it gets harder to purge frequentlycan't immediately purge

Open questions:

  • policy: lsst a "development" server and slac a "production" server?
    • [MikeF] Those terms are too overloaded with other connotations and probably doesn't capture the concept cleanly anyway. There are actually two independent concepts here -- the roles of the servers, and the status of the run data. For the former, I might suggest that we think of lsst10 as the "primary" database server, and the slac ones as "secondary" servers. See the diagram at InfrastructureWG. There is only one primary, but can be as many secondaries as needed. For the run data, I proposed the terminology described at DC3bDataLocations, but I'm open to suggestions if there are better labels than "candidate" and "released".
  • which runs to transfer? All? Production only?
  • are "completed/done" runs ever updated?
  • are we ever renaming databases to make them "production"?
  • mysqldump/load vs tar/gzip directory vs standard replication
  • how do we deal with synchronizing authorization?

In DC3b database will grow to ~15 TB. The database server (LSST10) disk speed will be ~100-300 MB/sec. Assuming 200 MB/sec, we could read:

  • 1 TB in ~1 1/2 hours
  • 15 TB in ~21 hours

Writing to tapes will likely be even slower (see comment 1). Note that it is sufficient to only backup tables that have not changed since last backup. That significantly reduces the amount of data that needs to be backed up:

  • during DC3b Data Release production we will likely process data in spatial order, and the largest tables will be partition spatially, so once a given patch of sky is done, we won't come back to it
  • all the largest tables (ForcedSource, Source, Object etc) will be updated only during Data Release production. Once done, there is no need to back them up again.

In any case, full backup daily is too expensive and time consuming (see comment 2).

We expect to have at least one off-site replica of the database (e.g., at SLAC) synchronized in real time. (see comment 3).

We will use statement-based replication. Pros and cons of statement-based vs row-based replication are described here. Main reasons to use statement-based replication include:

  • we don't run statements that are non-deterministic
  • it is easier to see what commands were executed (and remove accidental offending commands such as DROP TABLE in case of admin mistake)
  • logs for statement-based are smaller

Plan:

  • full backup once per week
  • incremental backup in real time, through binary log. Keep binary log files on a volume different than database files, and ensure at least one replica is receiving data in real time (see comment 3)
  • Full backup of entire database only takes place during production. When production is complete and only serving and possibly "tweaks" are occurring, full backups would only cover modified tables.

Full backup procedure:

  1. Connect to mysql master server (lsst10) and issue "USE <databaseName>", then "FLUSH TABLES WITH READ". Do not exit - keep this session window open. Note that there is no need to stop mysqld server. This method will allow read queries to continue uninterrupted, only updates will be blocked.
  1. In a separate window copy table files (corresponding to just locked database) to mass storage. It is sufficient to backup only the tables that has changed since the last backup.
  1. If full backup succeeded...

3a. Go back to the opened session and issue FLUSH LOGS

3b. Check status of all slaves by running: mysql -h <theSlaveHost> -u <user> -p<pwd> -e "SHOW SLAVE STATUS"

3c. Obtain a listing of the binary log files by running in the opened session: "SHOW BINARY LOGS"

3d. Determine the earliest log file among all the slaves. This the target file

3e. Purge all log files up to but not including the target file, by running: "PURGE BINARY LOGS TO 'mysql-bin.<number>'", where <number> is the number of the target file. (if we have space, we should consider backing up these files prior to purging them for some period of time, like a month, just for extra safety)

  1. Release the locks by running in the opened session "UNLOCK TABLES".

Relevant links:

Notes:

  • Binary logging must be enabled (start the server with --log-bin=lsst10_binlog). Feel free to pick up better name for the log files.
  • Binary logs need to be protected (through acl), as they may contain mysql passwords

Comments:

  1. [MikeF] Yes, writing to mass storage will be much slower, as that traffic would be going over ethernet, not FC.
  2. [MikeF] Yes, and it (a full backup) might be too much to do, even weekly. We need to do some benchmarks.
  3. [MikeF] Just checking that we are maintaining a clear distinction between fault tolerance and recoverability. The lsst10 storage is RAIDed, so we already have fault tolerance. The replica at SLAC provides additional fault tolerance. Neither RAID nor the SLAC replica provides recoverability (think: user issues sql delete from table).
    • [Jacek] Well, yes we can recover. We will do statement-based replication. If user accidentally drops a table or deletes from it, we can start from the most recent full backup, remove the offending "drop table" and "delete" type statements and replay the remaining commands from the binary log.