Table of Contents
Spatial Join Performance
[LSST Database] [Scalable Query Access Architecture]
This page describes performance of spatial join queries.
In practice, we expect spatial joins on Object table only. The object table will be partitioned (chunked) as described in Scalable Query Access Architecture. The queries discussed here correspond to a query executed on a single chunk; table partitioning and distribution is not considered here. It is expected many such chunk-based queries will execute in parallel.
Used schema:
CREATE TABLE X ( objectId BIGINT NOT NULL PRIMARY KEY, ra FLOAT NOT NULL, decl FLOAT NOT NULL, muRA FLOAT NOT NULL, muRAErr FLOAT NOT NULL, muDecl FLOAT NOT NULL, muDeclErr FLOAT NOT NULL, epoch FLOAT NOT NULL, bMag FLOAT NOT NULL, bFlag FLOAT NOT NULL, rMag FLOAT NOT NULL, rFlag FLOAT NOT NULL, b2Mag FLOAT NOT NULL, b2Flag FLOAT NOT NULL, r2Mag FLOAT NOT NULL, r2Flag FLOAT NOT NULL )
Used data: USNO catalog.
A first version of the query:
SELECT count(*) FROM X o1, X o2 WHERE o1.objectId <> o2.objectId AND ABS(o1.ra - o2.ra) < 0.00083 / COS(RADIANS(o2.decl)) AND ABS(o1.decl - o2.decl) < 0.00083
Precalculating COS(RADIANS(decl):
ALTER TABLE X ADD COLUMN cosRadDecl FLOAT UPDATE X set cosRadDecl = COS(RADIANS(decl))
and changing the order of predicates as follows:
SELECT count(*) FROM X o1, X o2 WHERE ABS(o1.ra - o2.ra) < 0.00083 / o2.cosRadDecl AND ABS(o1.decl - o2.decl) < 0.00083 AND o1.objectId <> o2.objectId
improves the execution time by 36% for mysql, and 38% for postgres.
Here is the timing for this (optimized) query.
| nRows | mysql | postgres |
| [K] | [sec] | [sec] |
| 1 | 1 | 5 |
| 2 | 5 | 19 |
| 3 | 11 | 40 |
| 4 | 18 | 69 |
| 5 | 28 | 103 |
| 10 | 101 | 371 |
| 15 | 215 | 797 |
| 20 | 368 | |
| 25 | 566 |
Postgres is ~3.7x slower than mysql in this case. It is probably possible to tune postgreSQL a little, but is it unlikely it will match MySQL performance.
Each of these queries for both mysql and postgres were completely CPU-dominated. The test was executed on an old-ish Sun 1503 MHz sparcv9 processor.
Also, see: SSDtests - the test rerun on a fast machine (dash-io).
Using index
The near neighbor query can be further optimized by introducing an index. Based on the tests we run with mysql, in order to force mysql to use an index for this particular query, we have to build a composite index on all used columns, or build a composite index on (ra, decl, cosRadDecl) and remove o1.objectId <> o2.objectId predicate (this predicate would have to be applied in a separate step). The timing for the query with index and without the objectId comparison:
| nRows | was | now | faster |
| [K] | [sec] | [sec] | [%] |
| 5 | 28 | 16.7 | 40 |
| 10 | 101 | 67.2 | 33 |
| 15 | 215 | 150.8 | 30 |
| 20 | 368 | 269.5 | 27 |
| 25 | 566 | 420.3 | 26 |
The speedup from using an index will likely be bigger for wider tables.
CPU utilization
How many CPUs do we need to do full correlation on 1 billion row table (DC3b)?
| # chunks | rows/chunk | seconds per | total | total core-hours |
| self-join | core-hours | if 16 cores used | ||
| in 1 chunk | needed | twice faster | ||
| 40,000 | 25,000 | 566 | 6,289 | 196 |
| 50,000 | 20,000 | 368 | 5,111 | 160 |
| 66,666 | 15,000 | 215 | 3,981 | 124 |
| 100,000 | 10,000 | 101 | 2,806 | 88 |
| 200,000 | 5,000 | 28 | 1,556 | 49 |
| 250,000 | 4,000 | 18 | 1,250 | 39 |
| 333,333 | 3,000 | 11 | 1,019 | 31 |
| 500,000 | 2,000 | 5 | 694 | 22 |
| 1,000,000 | 1,000 | 1 | 278 | 9 |
Realistically, we can count on ~2 8-core servers, ~twice faster than the CPUs used in these tests. That means 1 million chunk version would finish in 9 hours, 66K chunk version would need 5 days to finish.
Near neighbor without building sub-chunking
We tested the performance of running nn query without explicitly building sub-chunking. In all these tests describe here we tried to run nn on a 1 million row table. A starting point is 1000 sec (1 sec per 1K rows), which is ~16 min 40 sec.
First test: running near neighbor query by selecting rows with given subChunkId into in memory table and running nn query there. This test is here. It took 7 min 43 sec.
Second test: running near neighbor query by running neighbor once for each subChunkId, without building sub-chunks. This test is here. It took 39 min29 sec.
Third test: runnig near neighbor query by mini-near neighbor once for each subChunkId, without building sub-chunks, using in-memory table. This test is here. It took 13 min 13 sec.
Near neighbor with predicates
Note that full n2 correlation without any cuts is the worst possible spatial join, rarely needed in practice. A most useful form of near neighbour search is a correlation with some predicates. Here is an example hypothetical (non-scientific) query, written for the data set used in our tests:
SELECT count(*) FROM X o1, X o2 WHERE o1.bMag BETWEEN 20 AND 20.13 AND o2.rMag BETWEEN 19.97 AND 20.15 AND ABS(o1.ra - o2.ra) < 0.00083 / o2.cosRadDecl AND ABS(o1.decl - o2.decl) < 0.00083 AND o1.objectId <> o2.objectId
For the data used, the applied o1.bMag cut selects ~2% of the table, so does the o2.rMag cut (if applied independently).
With these cuts, the near neighbour query on 25K-row table takes 0.3 sec in mysql and 1 sec in postgres (it'd probably run even faster with indexes on bMag and rMag). So if we used mysql we would need only 3 (1503 MHz) CPUs do run query over 1 billion rows in one hour.
For selectivity 20% it takes mysql 12 sec to finish and postgres needs 35 sec. In this case mysql would need 133 (1503 MHz) CPUs to run this query over 1 billion rows in one hour.
This clearly shows predicate selectivity is one of the most important factors determining how slow/fast the spatial queries will run. In practice, if the selectivity is <10%, chunk size = ~25K or 50K rows should work well.
[perhaps we need to do more detailed study regarding predicate selectivity]
Numbers for lsst10
Lsst10 has 8 cores. Elapsed time for a single job is comparable to elapsed time of 8 jobs run in parallel (difference is within 2-3%, except for very small partitions, where it reaches 10%).
Testing involved running 8 "jobs", where each job was a set of queries executed sequentially. Each query was a near neighbor query:
SELECT count(*) AS neighbors
FROM XX o1 FORCE INDEX (idxRDC),
XX o2 FORCE INDEX (idxRDC)
WHERE ABS(o1.decl - o2.decl) < 0.00083
AND ABS(o1.ra - o2.ra) < 0.00083 / o2.cosRadDecl
-- idxRDC was defined as "ADD INDEX idxRDC(ra, decl, cosRadDecl)"
Each query run on a single partition. Results:
| rows per partition | seconds to self-join one partition | rows processed per elapsed sec | time to process 150m rows (DC3b) |
| 0.5K | 0.05 | 80K | 31min |
| 1K | 0.16 | 50K | 50min |
| 2K | 0.59 | 27K | 1h 32min |
| 5K | 3.63 | 11K | 3h 47min |
| 10K | 14.68 | 5.5K | 7h 39min |
| 15K | 40.09 | 3K | 14h |
See the attachment (200909nearNeigh-lsst10.xls) for details.
Attachments
-
200909nearNeigh-lsst10.xls
(12.0 KB) - added by jbecla
12 months ago.
