Compression of SDSS at FNAL

LSST Database] [compression]

(from Eric Neilsen at FNAL, July 31, 2008)

We Rice compressed the images at first, and gave up on it. If you put three astronomers in a room, they will want to use at least 4 different fits readers, and nobody wants to install a separate decompression tool. While there are a few fits readers out there that can handle some form of compression, the only compression format that is really convenient across fits reader implementations is gzip, and even that is not a sure thing (although one can reasonably assume that astronomers can uncompress gzips for themselves without difficulty).

I did some experimenting with SDSS data for a variety of compression approaches. Here are some results:

Compression

program algorithm real time user time sys time size (k)
cp none 2m10.549s 0m0.264s 0m10.053s 2990036
gzip --fast DEFLATE 2m17.139s 1m25.675s 0m6.846s 1211812
gzip DEFLATE 7m54.716s 7m44.088s 0m6.201s 1168668
gzip --best DEFLATE 29m33.090s 29m23.242s 0m6.785s 1136732
zip DEFLATE 7m15.210s 6m52.578s 0m6.501s 1168744
compress LZW 1m55.178s 0m55.193s 0m11.993s 1069800
sdsscompress Rice 1m38.810s 0m30.802s 0m5.856s 1031272
cfitsio Rice 2m1.402s 0m45.727s 0m9.056s 1015524
cfitsio HCompress 3m40.910s 2m11.203s 0m10.491s 958956
bzip2 BW 11m21.656s 10m43.308s 0m9.686s 934764

Decompression

program algorithm real time user time sys time
cp none 2m10.549s 0m0.264s 0m10.053s
gzip --fast DEFLATE 2m0.092s 0m33.015s 0m8.468s
gzip DEFLATE 1m45.470s 0m27.898s 0m8.634s
gzip --best DEFLATE 1m37.792s 0m25.721s 0m9.425s
compress Lempel-Ziv-Welch 1m49.712s 0m22.916s 0m16.848s
sdsscompress Rice 2m23.790s 1m16.547s 0m9.335s
cfitsio Rice 2m13.678s 0m36.721s 0m11.812s
cfitsio HCompress 3m46.396s 2m15.104s 0m18.529s
bzip2 Burrows-Wheeler 5m27.773s 4m56.550s 0m13.061s

The difference between custom fits compression schemes and generic, generally available compression schemes is pretty modest, and one pays a high price for irritating users.

(Note that the bzip2 beats even the custom schemes, but is very, very slow. I am told that it is very good at taking advantage of multiple cores, which the things like cfitsio do not, however. It's pretty common on linux boxes, but nothing like as ubiquitous as gzip.)

The DAS distributes corrected frames gzipped, as it is the most common and only does slightly worse than the others. There are some places where we use UNIX compress internally, because it is common enough that nobody here is annoyed by it and it is both faster and better than gzip on our images.