Changes between Version 4 and Version 5 of db/MySQL/Optimizations


Ignore:
Timestamp:
04/10/2013 03:22:41 AM (6 years ago)
Author:
jbecla
Comment:

moved some examples from mysqlhints page

Legend:

Unmodified
Added
Removed
Modified
  • db/MySQL/Optimizations

    v4 v5  
     1[[TOC]] 
     2 
    13= Query Optimizations = 
    24 
    35These are some examples of the huge speed-ups available by optimizing MySQL queries and indexes. 
     6 
     7== Optimizing queries the select objects in a region == 
     8 
     9The qserv_XXX UDFs are deprecated in favor of the scisql UDFs, which are documented here [http://lsst-web.ncsa.illinois.edu/schema/sciSQL_0.3/] including an introductory section on spatial geometry [http://lsst-web.ncsa.illinois.edu/schema/sciSQL_0.3/#s2]. 
     10 
     11Example from Serge explaining how to find objects within a rectangular area on the sky: 
     12{{{ 
     13-- Create a binary representation of the search polygon 
     14SET @poly = scisql_s2CPolyToBin(359.9601, 2.5952, 
     15                               0.0398, 2.5952, 
     16                               0.0398, 2.6748, 
     17                               359.9601, 2.6748); 
     18 
     19-- Compute HTM ID ranges for the level 20 triangles overlapping 
     20-- @poly. They will be stored in a temp table called scisql.Region 
     21-- with two columns, htmMin and htmMax 
     22CALL scisql.scisql_s2CPolyRegion(@poly, 20); 
     23 
     24-- Select reference objects inside the polygon. The join against 
     25-- the HTM ID range table populated above cuts down on the number of 
     26-- SimRefObject rows that need to be tested against the polygon 
     27SELECT refObjectId, isStar, ra, decl, rMag 
     28FROM SimRefObject AS sro INNER JOIN 
     29    scisql.Region AS r ON (sro.htmId20 BETWEEN r.htmMin AND r.htmMax) 
     30WHERE scisql_s2PtInCPoly(ra, decl, @poly) = 1; 
     31}}} 
     32 
     33If you are selecting a rectangular box (with constant-RA and constant-decl sides), you may wish to use the box function instead: {{{ scisql_s2PtInBox(ra,decl, raMin, declMin, raMax, declMax) = 1 }}} (raMin, declMin, raMax, declMax are constants as used in s2CPolyToBin). 
     34 
     35== Optimizing queries that join large tables == 
     36If your query joins with a large table (e.g., Object, Source, !ForcedSource, etc.), your query is likely to run slow. In order to speed things up, you may try pre-filtering the tables.  Depending on the filtering conditions (WHERE clause), you may be able to significantly cut the number of joined rows.  If only 10% (or less) of the large table will be involved, it is often faster to create a table consisting only of involved rows (slightly larger is okay if the rows can be later filtered out by column or join expressions).  Here is an example. 
     37 
     38Original query: 
     39 
     40{{{ 
     41SELECT 
     42      fsrc.ra, fsrc.decl, 
     43      deepSourceId, 
     44      deepForcedSourceId, 
     45      exp.run, 
     46      fsrc.timeMid, 
     47      scisql_dnToAbMag(fsrc.psfFlux, exp.fluxMag0) as g, 
     48      scisql_dnToAbMagSigma(fsrc.psfFlux, fsrc.psfFluxSigma,  
     49                            exp.fluxMag0, exp.fluxMag0Sigma) as gErr 
     50FROM 
     51      DeepForcedSource AS fsrc, 
     52      Science_Ccd_Exposure AS exp 
     53WHERE 
     54      exp.scienceCcdExposureId = fsrc.scienceCcdExposureId 
     55  AND fsrc.psfFlux is not null 
     56  AND fsrc.psfFluxSigma is not null 
     57  AND fsrc.ra >= 30.0 
     58  AND fsrc.ra < 60.0 
     59}}} 
     60 
     61It is going through all !ForcedSources, and for each of them joins with exposure table. A faster way to do that: 
     62 
     63{{{ 
     64 
     65-- Create your own database (replace xx with your mysql user name) 
     66CREATE DATABASE xx_tmp1; 
     67 
     68 
     69-- Select all rows of interest (bright stars in this case) without  
     70-- doing any joins, and save the results in your table. It will  
     71-- still take a long time, but at least it will be a nice sequential  
     72-- scan, which is the best you can do. To give you some idea about  
     73-- timing: our DC_W13_Stripe82.ForcedSource is 1.8 TB, going through  
     74-- the entire table at 100 MB/sec would take 6 hours, so don't expect  
     75-- magic. :) 
     76CREATE TABLE xx_tmp1.BrightStars 
     77SELECT 
     78     ra, decl, 
     79     deepSourceId, 
     80     deepForcedSourceId, 
     81     timeMid, 
     82     scienceCcdExposureId, 
     83     psfFlux, 
     84     psfFluxSigma 
     85FROM 
     86     DeepForcedSource 
     87WHERE 
     88     psfFlux is not null 
     89 AND psfFluxSigma is not null 
     90 AND ra >= 30.0 
     91 AND ra < 60.0; 
     92 
     93 
     94-- Add an index on BrightStars.scienceExposureId 
     95ALTER TABLE xx_tmp1.BrightStars ADD INDEX(scienceCcdExposureId); 
     96 
     97 
     98-- And then do the joins you need to do. Hopefully by now your  
     99-- data set will be much smaller than the initial one, so the 
     100-- join will be fast. 
     101CREATE TABLE xx_tmp1.MyResults 
     102SELECT 
     103     bs.ra, bs.decl, 
     104     deepSourceId, 
     105     deepForcedSourceId, 
     106     exp.run, 
     107     bs.timeMid, 
     108     scisql_dnToAbMag(bs.psfFlux, exp.fluxMag0) as g, 
     109     scisql_dnToAbMagSigma(bs.psfFlux, bs.psfFluxSigma, 
     110                           exp.fluxMag0, exp.fluxMag0Sigma) as gErr 
     111FROM 
     112     xx_tmp1.BrightStars AS bs, 
     113     Science_Ccd_Exposure AS exp 
     114WHERE 
     115     exp.scienceCcdExposureId = bs.scienceCcdExposureId; 
     116 
     117 
     118-- your results can now be retrieved using: 
     119SELECT * from xx_tmp1.MyResults;  
     120}}} 
     121 
     122Of course you don't have to save your results in !MyResults table, 
     123but saving results in a table has some benefits: you won't be 
     124flooded with output, you can reuse the results, etc...  
     125 
    4126 
    5127== Computation in DB instead of in Python ==