wiki:mysqlLargeTables
Last modified 6 years ago Last modified on 05/13/2013 04:58:25 AM

Loading (myisam)

alter table disable keys;
-- <load>, eg 
insert into X select * from x1;
insert into X select * from x2;
insert into X select * from x3;
alter table enable keys;

Sample timing (on lsst10, while it was competing with other non-trivial queries):

  • Stripe82_40deg (1.2 billion rows total). Loaded from 6 ~200-million-row tables:
    • 6h total time to insert data: 89, 62, 59, 56, 47, ? Krows/sec
    • 7h to enable keys
  • Stripe82 (total of 3.6 billion rows) 12 ~200-million-row tables loaded into 1.2-billion-row table:
    • 12h total time to insert data: ?, 70, 67, 72, 66, 64, 77, 40, 31, 45, 57, 56 Krows/sec
    • ~21h to enable keys

For comparison:

  • a similar insert (200m rows into 1.2-billion-rows) with indexes enabled (on a bigger, faster machine) took almost 4.5 hours (mysql will silently disable index, then load, then enable index -- enabling index takes so much time)
  • "insert into select * from" into 3.4 billion row table was going with a speed of ~2 million rows per hour (extrapolating: inserting 300 m rows would take 6+ days - for comparison with index disabled it takes ~2h, and enabling indexes takes <24h)

Most important settings:

  set myisam_repair_threads=6;                               -- # keys, higher number is ok
  set myisam_max_sort_file_size = 2G                         -- too large will fool the optimizer
  set global myisam_max_sort_file_size = 3072*1024*1024*1024 -- ~2-3x MYD
  -- plus, enough space in tmpdir to hold myisam_max_sort_file_size of data

Enable indexes

alter table <T> enable index

It goes through all rows, then it finalizes building indexes. This is more obviously visible via myisamchk (the row counter = first stage, "fixing index 1" etc = 2nd stage. Unfortunately often the failure occurs after the first stage, which often takes hours.

The first stage can be multithreaded by setting:

set myisam_repair_threads=<value>;

Note that if you don't set #threads, the query plan will be "sort" or "keycache", depending on the amount of tmp disk space, myisam_sort_buffer_size and myisam_max_sort_file_size:

  • you must have enough tmp disk space (e.g., ~2x MYD file or more) to use "sort"
  • myisam_sort_buffer_size should not be too large, e.g. anything above 2GB on a 3.4B row table on lsst10 with 64G ram triggered "keycache" method
  • myisam_max_sort_file_size must be set to something larger than size of the largest individual key for that table.

Suggestion: start "alter table" with myisam_repair_threads set to 1, experiment with myisam_sort_buffer_size and check query plan, choose largest value that still allows "sort" method. Then set myisam_repair_threads to the number of indexes that are in that table, and run for real

---

myisamchk has problems with large tables:

  • it is not obvious what plan it is using
  • it tends to run for a long time (going through all rows) and then after many hours (eg 5-6 on lsst10 for 3.4B rows), it fails as described below. Note that it complains about size of myisam_sort_buffer_size, and there is not way to set it.
  • pay attention what tmp directory it is using, it will need lots of temp space in the last stage.

Command we tried when enabling index for 3.4 billion row table:

/usr/local/mysql/bin/myisamchk 
   -v -t /usr/data/mysql/mysql_tmpdir/tmp4myisamcheck 
   --force 
   -o 
   --update-state 
   -p 
   --key_buffer_size=8G 
   --sort_buffer_size=8G 
   --read_buffer_size=8G 
   --write_buffer_size=8G 
   /usr/data/mysql/mysql_data/DC_W13_Stripe82/RunDeepForcedSource.MYI

(Per discussion with Monty, key_buffer_size is not used, set to something very small, read_buffer_size and write_buffer_size can be set to something smaller, say 1 MB, and the sort_buffer_size should be as large as possible (eg, 64GB on 128GB machine would help). Except, we found a bug in myisamchk which limits sort_buffer_size to 4GB.)

This runs for ~6 hours on lsst10 and fails in the last stage with

Key 1  - Last merge and dumping keys
/usr/local/mysql/bin/myisamchk: Unexpected eof found when reading file 
'/usr/data/mysql/mysql_tmpdir/tmp4myisamcheck/STP2PLC1' (Errcode: 175)
myisamchk: error: 175 when fixing table
MyISAM-table '/usr/data/mysql/mysql_data/DC_W13_Stripe82/RunDeepForcedSource.MYI' is not fixed because of errors
Try fixing it by using the --safe-recover (-o), the --force (-f) option or by not using the --quick (-q) flag
- recovering (with sort) MyISAM-table '/usr/data/mysql/mysql_data/DC_W13_Stripe82/RunDeepForcedSource.MYI'
Data records: 3591450365
- Fixing index 1
myisamchk: error: myisam_sort_buffer_size is too small
MyISAM-table '/usr/data/mysql/mysql_data/DC_W13_Stripe82/RunDeepForcedSource.MYI' is not fixed because of errors
Try fixing it by using the --safe-recover (-o), the --force (-f) option or by not using the --quick (-q) flag

The speed is ~250K rows/sec (260K initially, down to ~237K towards the end)

With --quick switch it runs 4.5h, but still fails the same way.

BTW, it looks like it is a "sort" method, as it said at the beginning "- parallel recovering (with sort)"

Adding index

Adding 2 indexes for 3.4 billion row tables took ~5 days. The problem is that mysql makes a full copy of the entire table (copy to tmp file), which takes ~2 days (it is very slow, straight cp would be much faster), then it is doing a single-thread sort, we have also seen plans that rely on the slow keycache method to build an index.

Much faster way to add an index is to create brand new table, alter the empty newly created table and add the indexes as needed, then "insert into <new table> select * from <original table>".

Sorting

Sorting a 1.2 billion row table (DC_W13_Stripe82_40Deg) with 6 indexes took 6 hours (on lsst10). Command used:

/usr/local/mysql/bin/myisamchk RunDeepForcedSource.MYI --sort-record=1 --sort_buffer_size=2G

Method used: recovering (with sort).

IN2P3 notes

mysqlLargeTablesAtIn2p3