<?php  
            require('/srv/new-pegasus.isi.edu/includes/common.php'); 
            pegasus_header("7.4. Remote Cluster using PyGlidein");
        ?><div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.php">Pegasus 4.8.0 User Guide</a></span> &gt; <span class="breadcrumb-link"><a href="execution_environments.php">Execution Environments</a></span> &gt; <span class="breadcrumb-node">Remote Cluster using PyGlidein</span>
</div><hr><div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="pyglidein"></a>7.4. Remote Cluster using PyGlidein</h2></div></div></div>
<p>
      Glideins (HTCondor pilot jobs) provide an efficient solution for high-throughput workflows.
      The glideins are submitted to the remote cluster scheduler, and once started up, makes it appear
      like your HTCondor pool extends into the remote cluster. HTCondor can then
      schedule the jobs to the remote compute node in the same way it would schedule jobs
      to local compute nodes.
    </p>
<p>
      Some infrastructures, such as <a class="link" href="open_science_grid.php" title="7.12. Open Science Grid Using glideinWMS">Open Science Grid</a>,
      provide infrastructure level glidein solutions,
      such as GlideinWMS. Another solution is <a class="link" href="bosco.php" title="7.9. Remote PBS Cluster using BOSCO and SSH">BOSCO</a>. For some more
      custom setups, 
      <a class="ulink" href="https://github.com/WIPACrepo/pyglidein" target="_top">pyglidein</a> from the
      <a class="ulink" href="http://icecube.wisc.edu/" target="_top">IceCube</a> project provides a nice framework.
      The architecture consists on a server on the submit host, which job it is to determining the demand.
      On the remote resource, the client can be invoked for example via cron, and submits directly
      to HTCondor, SLURM and PBS schedulers. This makes pyglidein very flexible and works well for
      example if the resource requires two-factor authentication. 
    </p>
<div class="figure">
<a name="fig-pyglidein"></a><p class="title"><b>Figure 7.4. pyglidein overview</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" style="cellpadding: 0; cellspacing: 0;" width="100%"><tr><td align="center" valign="middle"><img src="images/pyglidein.png" align="middle" height="360" alt="pyglidein overview"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>
      To get started with pyglidein, check out a copy of the Git repository on both your submit
      host as well as the cluster you want to glidein to. Starting with the submit host, first
      make sure you have HTCondor configured for
      <a class="ulink" href="http://research.cs.wisc.edu/htcondor/manual/current/3_8Security.html#SECTION00483400000000000000" target="_top">PASSWORD</a>
      authentication. Make a copy of the HTCondor pool password file. You will need it on the
      cluster, and it is a binary file, so make sure you cp instead of a copy-and-paste of the 
      file contents. To get the server started:
    </p>
<pre class="programlisting">
./server.py --port 11001  
    </pre>
<p>
      By default, the pyglidein server will use all jobs in the system to determine if glideins
      are needed. If you want user jobs to explicitly let us know they want glideins, you can pass
      a constraint for the server to use. For example, jobs could have the 
      <span class="emphasis"><em>+WantStampede2 = True</em></span> attribute, and then we could start the
      server with:
    </p>
<pre class="programlisting">
./server.py --port 11001 --constraint "'WantStampede2 == True'"  
    </pre>
<p>
      One the server is running, you can check status by pointing a web browser to it.
    </p>
<p>
      Next step is to create a <span class="emphasis"><em>glidein.tar.gz</em></span> file containing
      the HTCondor binaries, our pool password file, and a modified job wrapper script.
      This can be accomplished by building HTCondor with the
      <span class="emphasis"><em>create_glidein_tarball.py</em></span> script, but first we need to modify
      <span class="emphasis"><em>glidein_template/</em></span>. Start by copying your pool password file
      over the existing <span class="emphasis"><em>passwdfile</em></span> file.
    </p>
<p>
      Edit <span class="emphasis"><em>user_job_wrapper.sh</em></span>. We don't need most of it, so
      edit it to read:
    </p>
<pre class="programlisting">
#!/bin/bash

# This script is started just before the user job
# It is referenced by the USER_JOB_WRAPPER

export HOME=$PWD

# fix PATH and LD_LIBRARY_PATH
export PATH=$PATH:/usr/bin:/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64:/usr/local/lib:/usr/lib64:/usr/lib:/usr/lib/x86_64-linux-gnu:/lib64:/lib:/lib/x86_64-linux-gnu

GLIDEIN_DIR=$GLIDEIN_LOCAL_TMP_DIR
if [ ! -d $GLIDEIN_DIR ]; then
    GLIDEIN_DIR=$PWD
fi
JOB_WRAPPER="${GLIDEIN_DIR}/job_wrapper.sh"

# fall through to next/default job wrapper
if [ ! -e $JOB_WRAPPER ]; then
    exec "$@"
else
    exec ${JOB_WRAPPER} "$@"
fi   
    </pre>
<p>
      Create the glidein.tar.gz by running:
    </p>
<pre class="programlisting">
python create_glidein_tarball.py
    </pre>
<p>
      Once you have the glidein.tar.gz file, copy it to the Git checkout you have on 
      the remote cluster. Then move over there for the remaining steps. Create a 
      configuration file for your glidein under <span class="emphasis"><em>configs/</em></span>. Here
      is an example for TACC Stampede2:
    </p>
<pre class="programlisting">
[Mode]
debug = True

[Glidein]
address = http://workflow.isi.edu:11001/jsonrpc
site = TACC-Stampede2
tarball = /home1/00384/rynge/git/pyglidein/glidein.tar.gz

[Cluster]
user = rynge
os = RHEL7
scheduler = slurm
submit_command = sbatch
walltime_hrs = 48
max_total_jobs = 10
max_idle_jobs = 1
limit_per_submit = 1

gpu_only = False
whole_node = True
whole_node_cpus = 1
whole_node_memory = 96000
whole_node_disk = 30000
whole_node_gpus = 0
group_jobs = False
partition = normal
running_cmd = squeue -u $USER -t RUNNING -p normal -h | wc -l
idle_cmd = squeue -u $USER -t PENDING -p normal -h | wc -l

[SubmitFile]
filename = submit.slurm
local_dir = /tmp/$SLURM_JOB_ID
custom_header = #SBATCH -A TG-ABC00001
cvmfs_job_wrapper = False

[CustomEnv]
CLUSTER = workflow.isi.edu
    </pre>
<p>
      This configuration will obviously look different for different clusters.
      <span class="emphasis"><em>configs/</em></span> has a bunch of example configs, but a few things to 
      note: 
    </p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p><span class="bold"><strong>address</strong></span> is the location of the server
        we started earlier</p></li>
<li class="listitem"><p><span class="bold"><strong>tarball</strong></span> is the full path to our custom
        glidein.tar.gz file we created above.</p></li>
<li class="listitem"><p><span class="bold"><strong>CLUSTER</strong></span> is the location of your
        HTCondor central manager. In many cases this is the same host you started
        the server on. Please note that if you do not set this variable, the
        glideins will try to register into the IceCube infrastructure.</p></li>
</ul></div>
<p>
      At this point we can try our first glidein:
    </p>
<pre class="programlisting">
./client.py --config=$HOME/git/pyglidein/configs/stampede2.config
    </pre>
<p>
      Once we have a seen a successful glidein, we can add the client to the crontab:
    </p>
<pre class="programlisting">
# m  h  dom mon dow   command
*/10 *   *   *   *    (cd ~/git/pyglidein/ &amp;&amp; ./client.py --config=$HOME/git/pyglidein/configs/stampede2.config) &gt;~/cron-pyglidein.log 2&gt;&amp;1
    </pre>
<p>
      With this setup, glideins will now appear automatically based on the demand in the local
      HTCondor queue.
    </p>
</div><div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="cloud.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="execution_environments.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="globus_gram.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">7.3. Cloud (Amazon EC2/S3, Google Cloud, ...) </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 7.5. Remote Cluster using Globus GRAM</td>
</tr>
</table>
</div><?php  
            pegasus_footer();
        ?>
