Compression of SDSS at FNAL
(from Eric Neilsen at FNAL, July 31, 2008)
We Rice compressed the images at first, and gave up on it. If you put three astronomers in a room, they will want to use at least 4 different fits readers, and nobody wants to install a separate decompression tool. While there are a few fits readers out there that can handle some form of compression, the only compression format that is really convenient across fits reader implementations is gzip, and even that is not a sure thing (although one can reasonably assume that astronomers can uncompress gzips for themselves without difficulty).
I did some experimenting with SDSS data for a variety of compression approaches. Here are some results:
Compression
| program | algorithm | real time | user time | sys time | size (k) |
| cp | none | 2m10.549s | 0m0.264s | 0m10.053s | 2990036 |
| gzip --fast | DEFLATE | 2m17.139s | 1m25.675s | 0m6.846s | 1211812 |
| gzip | DEFLATE | 7m54.716s | 7m44.088s | 0m6.201s | 1168668 |
| gzip --best | DEFLATE | 29m33.090s | 29m23.242s | 0m6.785s | 1136732 |
| zip | DEFLATE | 7m15.210s | 6m52.578s | 0m6.501s | 1168744 |
| compress | LZW | 1m55.178s | 0m55.193s | 0m11.993s | 1069800 |
| sdsscompress | Rice | 1m38.810s | 0m30.802s | 0m5.856s | 1031272 |
| cfitsio | Rice | 2m1.402s | 0m45.727s | 0m9.056s | 1015524 |
| cfitsio | HCompress | 3m40.910s | 2m11.203s | 0m10.491s | 958956 |
| bzip2 | BW | 11m21.656s | 10m43.308s | 0m9.686s | 934764 |
Decompression
| program | algorithm | real time | user time | sys time |
| cp | none | 2m10.549s | 0m0.264s | 0m10.053s |
| gzip --fast | DEFLATE | 2m0.092s | 0m33.015s | 0m8.468s |
| gzip | DEFLATE | 1m45.470s | 0m27.898s | 0m8.634s |
| gzip --best | DEFLATE | 1m37.792s | 0m25.721s | 0m9.425s |
| compress | Lempel-Ziv-Welch | 1m49.712s | 0m22.916s | 0m16.848s |
| sdsscompress | Rice | 2m23.790s | 1m16.547s | 0m9.335s |
| cfitsio | Rice | 2m13.678s | 0m36.721s | 0m11.812s |
| cfitsio | HCompress | 3m46.396s | 2m15.104s | 0m18.529s |
| bzip2 | Burrows-Wheeler | 5m27.773s | 4m56.550s | 0m13.061s |
The difference between custom fits compression schemes and generic, generally available compression schemes is pretty modest, and one pays a high price for irritating users.
(Note that the bzip2 beats even the custom schemes, but is very, very slow. I am told that it is very good at taking advantage of multiple cores, which the things like cfitsio do not, however. It's pretty common on linux boxes, but nothing like as ubiquitous as gzip.)
The DAS distributes corrected frames gzipped, as it is the most common and only does slightly worse than the others. There are some places where we use UNIX compress internally, because it is common enough that nobody here is annoyed by it and it is both faster and better than gzip on our images.
