DC3bMiddlewareTasks/CondorJobs

Middleware: Condor Jobs

Setting up to run condor jobs to Abe

In your $HOME/.profile you need:

export LSST_HOME=/u/ncsa/stack_location/stack
source $LSST_HOME/loadLSST.sh

setup ctrl_dc3pipe
export PATH=${LSST_HOME}/Linux64/external/mpich2/1.0.5p4/bin:${PATH}

This sets up the LSST software stack, and sets the LSST mpich2 stack up. The LSST mpich2 binaries must be in the $PATH before the system MPICH binaries.

You also need a $HOME/.mpd.conf file which looks like:

MPD_SECRETWORD=<randomphrase>

This file must have permissions 600:

chmod 600 $HOME/.mpd.conf

Abe/condor_submit

In Orca, jobs are submitted to Abe by configuring the policy files to use the AbePipelineConfigurator?.

This object is responsible for creating the proper job directories, copying policy files over, creating a launch script, creating a submission file for condor, and using condor_submit to send the job to be run.

The format of this condor submit file looks like:

universe=globus
executable=$PEX_HARNESS_DIR/bin/launchPipelineAbe.sh
globusrsl = (jobtype=single)(hostcount=1)(maxWallTime=30)
arguments=IPSD.paf runid001 -L trace -S orca_launch.sh
transfer_executable=false
globusscheduler=grid-abe.ncsa.teragrid.org:2119/jobmanager-pbs
output=IPSDCondor.out
error=IPSDCondor.err
log=IPSDCondor.log
remote_initialdir=/u/ac/joeuser/orca_scratch/runid001/IPSD/work
queue

This sends jobs to be run to the job scheduler, which runs the pipeline as some later point.

Abe/glide_in

An alternative is to first run the condor_glidein command to first reserve a sent of nodes on abe. A local condor master is set up, and the request is sent to abe to reserve a certain number of nodes. Then, instead of submitting directly to Abe, a condor_submit is done to the local master.

When the condor_glidein command runs on Abe it sets up the condor demons which communicate back to the local master. At this point, running a "condor_status" will show those nodes are part of the local condor pool. Any jobs scheduled for this local pool are run using these resources.

The condor_glidein command looks like:

condor_glidein -count 1 -setup_jobmanager=jobmanager-fork -arch=7.4.0-i686-pc-Linux-2.4 -idletime 5 grid-abe.ncsa.teragrid.org/jobmanager-pbs

The local submit file looks similar to this:

universe=vanilla
executable=orca_launch.sh
arguments=//stack/Linux64/pex_harness/3.3.5/bin/launchPipeline.sh IPSD.paf srp122810 -L trace -S /scratch/srp122810/IPSD/work/abesetup.sh
transfer_executable=false
output=IPSD_Condor.out
error=IPSD_Condor.err
log=IPSD_Condor.log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
remote_initialdir=/scratch/srp122810/IPSD/work
Requirements = (FileSystemDomain != "dummy") && (Arch != "dummy") && (OpSys != "dummy") && (Disk != -1) && (Memory != -1)
queue

Some of the options in this file are likely not required, depending on what exactly you're running on the remote machine.

The condor_glidein command will be used by Orca to reserve nodes, and then submit a DAGman job. The DAGman .dag file will describe how all of the machines are set up, and which machines are dependent on others. This functionality will be built into modified orca module.

Currently, all the pipelines are handled independently, and launched separately. For a glidein and DAG, this will have to be modified. What we need to do is request the glidein, and submit the DAG, no matter the number of pipelines involved. The condor_submit_dag will do the condor_submit requests on our behalf.

dag templates

  • Users should be able to specify how DAGs are setup. DAGs currently have the form:
    # File name: diamond.dag
    #
    JOB  A  A.condor 
    JOB  B  B.condor 
    JOB  C  C.condor	
    JOB  D  D.condor
    PARENT A CHILD B C
    PARENT B C CHILD D

What we'd like to have is a DAG that describes pipelines relationships in LSST terms, but without having to write the .condor files ahead of time. The current thinking about this is to do something like this:

In the policy file, have a "dagtemplate" entry that points to a DAG template of the form:

    # DAGtemplate: diamond
    #
    JOB  A  lsstpipe://PIPE1 
    JOB  B  lsstpipe://PIPE2
    JOB  C  lsstpipe://PIPE3
    JOB  D  lsstpipe://PIPE4
    PARENT A CHILD B C
    PARENT B C CHILD D

Orca would take this template, and match the "lsstpipe:shortName" abbreviations in the policy files against this template. When Orca writes out the .condor files to be used for each of these pipelines, it would substitute condor job files in the template:

    # DAGTemplate: diamond
    #
    JOB  A  PIPE1.condor 
    JOB  B  PIPE2.condor
    JOB  C  PIPE3.condor
    JOB  D  PIPE4.condor
    PARENT A CHILD B C
    PARENT B C CHILD D

and then run condor_submit_dag with this new DAG file.

open questions

  • How will monitoring work?
    • What condor capabilities can we take advantage of?
    • Where do we use event-based monitoring?
    • Do we use a combination of the two?
  • How do we control data flow?
    • What condor capabilities can we leverage?
    • How do we distribute input data to the pipelines?
    • How should we pass the data between pipelines?
  • condor_glideins are associated per user, so two users submitted glideins can't "steal" each others nodes. Is there going to be a problem for one user submitting multiple glideins at the same time? It seems that two jobs, one with 128 nodes and another with 3 nodes might run into a situation where 128 nodes become available, but get scheduled for the 3 node job, and it's possible that by the time the other 3 nodes become available for the 128 node job, the other 125 nodes might not have a long time to live. This may be resolvable by adding extra condor requirements to the job, which those glidein nodes can only satisfy.

notes

If you see a message like:

Running/verifying Glidein installation and setup...
Error: condor_glidein needs a valid grid proxy. (Re)run grid-proxy-init.

it means you have to run myproxy-logon before submitting the job.