wiki:DC2ApInAppTests
Last modified 12 years ago Last modified on 07/17/2007 04:42:12 PM

DC2 Association Pipeline in Application Tests

LSST Database DC2 DC2 Database Partitioning Tests?

These tests are aimed at comparing the performance and complexity of an all-in-C++ association pipeline (AP) implementation with that of a partially in the database (or all-in-db) approach. See the DC2 Database Partitioning Tests? page for a summary of the AP design as well as details and numbers on the in-database approach.

Code

See this tar file for the code. Files of interest are Match.h, ZoneTypes.*, and ChunkedEntityStore.* for the cross-match and spatial indexing implementation. FileDataUtil.cc contains the disk IO implementation.

Testing

Hardware

Both the database and client test machines have the following specs:

  • SunFire V240
  • 2 UltraSPARC IIIi CPUs, 1503 MHz
  • 16 GB RAM
  • 2 Sun StoreEdge T3 arrays, 470GB each, configured in RAID 5, sustained sequential write speed (256KB blocks) 150 MB/sec, read: 146 MB/sec
  • OS: Sun Solaris sun4x_510
  • MySQL: version 5.0.27

Overview

For the all-in-C++ approach, we consider only the high density test region from the DC2 Database Partitioning Tests?. The partitioning strategy chosen is the fine chunks? strategy (see also the cross match performance? page).

The first two tests obtain all their data from in-memory tables in the database. Writing data back to the in-memory database tables is not yet implemented, but is not expected to be performance critical since the amount of data that changes per FOV is estimated to be ~100 times smaller than the amount of data that must be read. It is important to note that this test setup still requires an in-database prepare and post-processing phase (to read and write partitions from/to disk). 3596852 Objects (split across 125 chunk tables) and 30276 DIASources are read, indexed, and cross-matched.

The remaining tests read and write 3596852 Objects (split across 125 chunk files) directly from disk -- DIASources are still obtained from MySQL, and in all cases these tests are run on a different machine than the MySQL server.

Issues

It appears that using the MySQL prepared statement C API results in a binary protocol being used to transfer data between client and server (rather than ASCII). This may result in significantly faster data reads from the database compared to the test code implementation.

Test 1

AP test code is run on a different machine than the MySQL server, with all data obtained from MySQL in-memory tables.

Skinny Object rows (80 bytes per Object in C++)

Single threaded :

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 54.7432sec 0.870557sec 2.78196sec 0.02249sec 0.140941sec
2nd run 54.5345sec 0.881551sec 2.77728sec 0.022784sec 0.143067sec
3d run 54.6414sec 0.874059sec 2.77313sec 0.02179sec 0.137808sec

Reading and indexing Objects (but not DIASources) and cross-match are parallelized (2 threads, OpenMP):

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 30.9232sec 0.872581sec 1.48222sec 0.020529sec 0.075589sec
2nd run 30.8489sec 0.873579sec 1.46863sec 0.020623sec 0.07889sec
3d run 30.9746sec 0.897659sec 1.4669sec 0.019053sec 0.074364sec

Fat Object rows (1680 bytes per Object in C++)

Single threaded :

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 29min 38.2689sec 0.890446sec 2.76659sec 0.021464sec 0.138365sec

Reading and indexing Objects (but not DIASources) and cross-match are parallelized (2 threads, OpenMP):

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 15min 30.7199sec 0.881847sec 1.48534sec 0.02033sec 0.073867sec

Test 2

AP test code is run on the same machine as the MySQL server, all data is obtained from MySQL in-memory tables.

Skinny Object rows (80 bytes per Object in C++)

Single threaded :

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 50.4229sec 0.845357sec 2.77505sec 0.020945sec 0.139122sec
2nd run 50.4676sec 0.856322sec 2.75798sec 0.020944sec 0.139537sec
3d run 50.4699sec 0.846505sec 2.77049sec 0.022603sec 0.137359sec

Reading and indexing Objects (but not DIASources) and cross-match are parallelized (2 threads, OpenMP):

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-match
1st run 41.7191sec 1.46744sec 1.49635sec 0.018844sec 0.077193sec
2nd run 41.812sec 1.46544sec 1.47142sec 0.020047sec 0.075654sec
3d run 41.8242sec 1.46609sec 1.4761sec 0.018559sec 0.075653sec

Fat object tests weren't run as the limiting factor in these tests does not appear to be network bandwidth. Perhaps more importantly, the amount of RAM required for both the MySQL server and the AP test code to hold everything in memory exceeds the physical memory (16GB) of the test machine.

Test 3

Object data is stored uncompressed, in the filesystem.

Skinny Object rows (80 bytes per Object in C++)

Single threaded:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 6.59568sec 0.869447sec 2.75874sec 0.022005sec 0.138378sec 3.6789sec
2nd run 6.8393sec 0.870958sec 2.80132sec 0.022678sec 0.144812sec 3.6687sec
3d run 6.48693sec 0.877598sec 2.8053sec 0.022716sec 0.142251sec 3.68056sec

iostat reports reading at between 30-40MB/s, and writes at 72-77MB/s. There is obviously some room for optimizing the read phase - one thing that may be slowing things down is that the current C++ implementation fstat()s each file before reading it.

Reading and indexing Objects (but not DIASources) and cross-match are parallelized (2 threads, OpenMP):

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 6.8077sec 0.87103sec 1.48831sec 0.020384sec 0.075332sec 3.53021sec
2nd run 5.32758sec 0.892812sec 1.48591sec 0.019264sec 0.075376sec 3.46469sec
3d run 5.18746sec 0.884729sec 1.50963sec 0.018953sec 0.074582sec 3.49589sec

Fat Object rows (1680 bytes per Object in C++)

Single threaded:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 2min 14.86sec 0.895992sec 2.77534sec 0.022267sec 0.141189sec 1min 24.701sec
2nd run 2min 12.4167sec 0.88814sec 2.73945sec 0.023066sec 0.134532sec 1min 24.5721sec
3d run 2min 12.4439sec 0.894007sec 2.79665sec 0.021129sec 0.141632sec 1min 24.6554sec

iostat reports reading at 40-50MB/s and writes at 62-73MB/s, averaging about 70MB/s.

Reading and indexing Objects (but not DIASources) and cross-match are parallelized (2 threads, OpenMP):

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 1min 37.5319sec 0.877174sec 1.48006sec 0.018783sec 0.081669sec 1min 21.5611sec
2nd run 1min 36.9173sec 0.884186sec 1.47356sec 0.020365sec 0.07998sec 1min 21.3471sec
3d run 1min 37.6092sec 0.886824sec 1.48126sec 0.0192sec 0.073745sec 1min 21.6332sec

iostat reports reading at between 50-60MB/s and writes at between 60 and 90MB/s (averaging about 70MB/s).

Test 4

Object data is stored in compressed form, in the filesystem. The zlib 1.2.3 library is used for compression, with compression level set to 1 (fastest compression speed). Comparison of file sizes between this test and test 3 (for skinny Objects) shows that a compression ratio of 2.4 to 2.5 is achieved. Skinny Objects are USNO-B records, so this is on real data.

Skinny Object rows (80 bytes per Object in C++)

Single threaded; blocking IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 9.8604sec 0.866069sec 2.75899sec 0.022834sec 0.141944sec 27.6198sec
2nd run 9.07415sec 0.874381sec 2.78356sec 0.021637sec 0.137272sec 27.6712sec
3d run 9.09052sec 0.890448sec 2.77036sec 0.022836sec 0.1385sec 27.6058sec

Single threaded; POSIX asynchronous IO with 256KB block-size to overlap (de)compression with IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 8.22706sec 0.886795sec 2.77809sec 0.023505sec 0.136026sec 22.0411sec
2nd run 8.24921sec 0.88799sec 2.7737sec 0.022554sec 0.141322sec 22.0876sec
3d run 8.26437sec 0.883083sec 2.8054sec 0.023103sec 0.136612sec 22.0764sec

Reading, writing, and indexing Objects (but not DIASources) as well as cross-match are parallelized (2 threads, OpenMP); blocking IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 5.77275sec 0.877491sec 1.46401sec 0.020587sec 0.074475sec 14.7219sec
2nd run 5.10683sec 0.874143sec 1.47419sec 0.019sec 0.075545sec 14.2595sec
3d run 5.14575sec 0.88141sec 1.486sec 0.020847sec 0.076886sec 14.3449sec

Reading, writing, and indexing Objects (but not DIASources) as well as cross-match are parallelized (2 threads, OpenMP); POSIX asynchronous IO with 256KB block-size to overlap (de)compression with IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 5.35527sec 0.885627sec 1.4765sec 0.020217sec 0.074277sec 11.4534sec
2nd run 4.85055sec 0.871508sec 1.48226sec 0.019137sec 0.075864sec 11.4594sec
3d run 4.80289sec 0.878306sec 1.49828sec 0.020087sec 0.076382sec 11.3513sec

Fat Object rows (1680 bytes per Object in C++)

Note that since fat object rows are generated by adding 200 double precision columns initialized with random values to the USNO-B table, the fat Objects files have very poor compression ratios (compression ratios are about 1.07).

Single threaded; POSIX asynchronous IO with 256KB block-size to overlap (de)compression with IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 4min 8.29653sec 0.880347sec 2.76842sec 0.022529sec 0.142161sec 14min 40.1986sec

Reading, writing, and indexing Objects (but not DIASources) as well as cross-match are parallelized (2 threads, OpenMP); POSIX asynchronous IO with 256KB block-size to overlap (de)compression with IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 2min 30.0935sec 0.880941sec 1.48516sec 0.019295sec 0.07449sec 7min 36.2109sec

Test 5

Object data is stored in compressed form. The zlib 1.2.3 library is used for compression, with compression level set to 9 (best compression, slowest compression speed). Comparison of file size between this test and test 3 (using skinny rows) shows a compression ratio of 2.8 to 3.0 is achieved. For this test only one data point is provided since compression speed is untenably slow. Also, compression ratios aren't significantly better than in test 4.

Skinny Object rows (80 bytes per Object in C++)

Single threaded; blocking IO:

Read ObjectsRead DIASourcesIndex ObjectsIndex DIASourcesCross-matchWrite Objects
1st run 7.89607sec 0.880087sec 2.78697sec 0.021342sec 0.140914sec 7min 50.2722sec

Attachments