Middleware: Condor Jobs
Setting up to run condor jobs to Abe
In your $HOME/.profile you need:
export LSST_HOME=/u/ncsa/stack_location/stack
source $LSST_HOME/loadLSST.sh
setup ctrl_dc3pipe
export PATH=${LSST_HOME}/Linux64/external/mpich2/1.0.5p4/bin:${PATH}
This sets up the LSST software stack, and sets the LSST mpich2 stack up. The LSST mpich2 binaries must be in the $PATH before the system MPICH binaries.
You also need a $HOME/.mpd.conf file which looks like:
MPD_SECRETWORD=<randomphrase>
This file must have permissions 600:
chmod 600 $HOME/.mpd.conf
Abe/condor_submit
In Orca, jobs are submitted to Abe by configuring the policy files to use the AbePipelineConfigurator?.
This object is responsible for creating the proper job directories, copying policy files over, creating a launch script, creating a submission file for condor, and using condor_submit to send the job to be run.
The format of this condor submit file looks like:
universe=globus executable=$PEX_HARNESS_DIR/bin/launchPipelineAbe.sh globusrsl = (jobtype=single)(hostcount=1)(maxWallTime=30) arguments=IPSD.paf runid001 -L trace -S orca_launch.sh transfer_executable=false globusscheduler=grid-abe.ncsa.teragrid.org:2119/jobmanager-pbs output=IPSDCondor.out error=IPSDCondor.err log=IPSDCondor.log remote_initialdir=/u/ac/joeuser/orca_scratch/runid001/IPSD/work queue
This sends jobs to be run to the job scheduler, which runs the pipeline as some later point.
Abe/glide_in
An alternative is to first run the condor_glidein command to first reserve a sent of nodes on abe. A local condor master is set up, and the request is sent to abe to reserve a certain number of nodes. Then, instead of submitting directly to Abe, a condor_submit is done to the local master.
When the condor_glidein command runs on Abe it sets up the condor demons which communicate back to the local master. At this point, running a "condor_status" will show those nodes are part of the local condor pool. Any jobs scheduled for this local pool are run using these resources.
The condor_glidein command looks like:
condor_glidein -count 1 -setup_jobmanager=jobmanager-fork -arch=7.4.0-i686-pc-Linux-2.4 -idletime 5 grid-abe.ncsa.teragrid.org/jobmanager-pbs
The local submit file looks similar to this:
universe=vanilla executable=orca_launch.sh arguments=//stack/Linux64/pex_harness/3.3.5/bin/launchPipeline.sh IPSD.paf srp122810 -L trace -S /scratch/srp122810/IPSD/work/abesetup.sh transfer_executable=false output=IPSD_Condor.out error=IPSD_Condor.err log=IPSD_Condor.log should_transfer_files = YES when_to_transfer_output = ON_EXIT remote_initialdir=/scratch/srp122810/IPSD/work Requirements = (FileSystemDomain != "dummy") && (Arch != "dummy") && (OpSys != "dummy") && (Disk != -1) && (Memory != -1) queue
Some of the options in this file are likely not required, depending on what exactly you're running on the remote machine.
The condor_glidein command will be used by Orca to reserve nodes, and then submit a DAGman job. The DAGman .dag file will describe how all of the machines are set up, and which machines are dependent on others. This functionality will be built into modified orca module.
Currently, all the pipelines are handled independently, and launched separately. For a glidein and DAG, this will have to be modified. What we need to do is request the glidein, and submit the DAG, no matter the number of pipelines involved. The condor_submit_dag will do the condor_submit requests on our behalf.
dag templates
- Users should be able to specify how DAGs are setup. DAGs currently have the form:
# File name: diamond.dag
#
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
PARENT A CHILD B C
PARENT B C CHILD D
What we'd like to have is a DAG that describes pipelines relationships in LSST terms, but without having to write the .condor files ahead of time. The current thinking about this is to do something like this:
In the policy file, have a "dagtemplate" entry that points to a DAG template of the form:
# DAGtemplate: diamond
#
JOB A lsstpipe://PIPE1
JOB B lsstpipe://PIPE2
JOB C lsstpipe://PIPE3
JOB D lsstpipe://PIPE4
PARENT A CHILD B C
PARENT B C CHILD D
Orca would take this template, and match the "lsstpipe:shortName" abbreviations in the policy files against this template. When Orca writes out the .condor files to be used for each of these pipelines, it would substitute condor job files in the template:
# DAGTemplate: diamond
#
JOB A PIPE1.condor
JOB B PIPE2.condor
JOB C PIPE3.condor
JOB D PIPE4.condor
PARENT A CHILD B C
PARENT B C CHILD D
and then run condor_submit_dag with this new DAG file.
open questions
- How will monitoring work?
- What condor capabilities can we take advantage of?
- Where do we use event-based monitoring?
- Do we use a combination of the two?
- How do we control data flow?
- What condor capabilities can we leverage?
- How do we distribute input data to the pipelines?
- How should we pass the data between pipelines?
- condor_glideins are associated per user, so two users submitted glideins can't "steal" each others nodes. Is there going to be a problem for one user submitting multiple glideins at the same time? It seems that two jobs, one with 128 nodes and another with 3 nodes might run into a situation where 128 nodes become available, but get scheduled for the 3 node job, and it's possible that by the time the other 3 nodes become available for the 128 node job, the other 125 nodes might not have a long time to live. This may be resolvable by adding extra condor requirements to the job, which those glidein nodes can only satisfy.
notes
If you see a message like:
Running/verifying Glidein installation and setup... Error: condor_glidein needs a valid grid proxy. (Re)run grid-proxy-init.
it means you have to run myproxy-logon before submitting the job.
