<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="condor_pool.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="globus_gram.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="cloud"></a>7.3. Cloud (Amazon EC2/S3, Google Cloud, ...)</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="cloud.php#amazon_aws">7.3.1. Amazon EC2</a></span></dt>
<dt><span class="section"><a href="cloud.php#google_cloud">7.3.2. Google Cloud Platform</a></span></dt>
</dl></div>
<div class="figure">
<a name="concepts-fig-cloud-layout"></a><p class="title"><b>Figure 7.2. Cloud Sample Site Layout</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" style="cellpadding: 0; cellspacing: 0;" width="100%"><tr><td align="center" valign="middle"><img src="images/fg-pwms-prefio.3.png" align="middle" height="360" alt="Cloud Sample Site Layout"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>This figure shows a sample environment for executing Pegasus across
    multiple clouds. At this point, it is up to the user to provision the
    remote resources with a proper VM image that includes a HTCondor worker
    that is configured to report back to a HTCondor master, which can be
    located inside one of the clouds, or outside the cloud.</p>
<p>The submit host is the point where a user submits Pegasus workflows
    for execution. This site typically runs a HTCondor collector to gather
    resource announcements, or is part of a larger HTCondor pool that collects
    these announcements. HTCondor makes the remote resources available to the
    submit host's HTCondor installation.</p>
<p>The <a class="link" href="cloud.php#concepts-fig-cloud-layout" title="Figure 7.2. Cloud Sample Site Layout">figure above</a>
    shows the way Pegasus WMS is deployed in cloud computing resources,
    ignoring how these resources were provisioned. The provisioning request
    shows multiple resources per provisioning request.</p>
<p>The initial stage-in and final stage-out of application data into
    and out of the node set is part of any Pegasus-planned workflow. Several
    configuration options exist in Pegasus to deal with the dynamics of push
    and pull of data, and when to stage data. In many use-cases, some form of
    external access to or from the shared file system that is visible to the
    application workflow is required to facilitate successful data staging.
    However, Pegasus is prepared to deal with a set of boundary cases.</p>
<p>The data server in the figure is shown at the submit host. This is
    not a strict requirement. The data server for consumed data and data
    products may both be different and external to the submit host, or one of
    the object storage solution offered by the cloud providers</p>
<p>Once resources begin appearing in the pool managed by the submit
    machine&amp;rsquor;s HTCondor collector, the application workflow can be
    submitted to HTCondor. A HTCondor DAGMan will manage the application
    workflow execution. Pegasus run-time tools obtain timing-, performance and
    provenance information as the application workflow is executed. At this
    point, it is the user's responsibility to de-provision the allocated
    resources.</p>
<p>In the figure, the cloud resources on the right side are assumed to
    have uninhibited outside connectivity. This enables the HTCondor I/O to
    communicate with the resources. The right side includes a setup where the
    worker nodes use all private IP, but have out-going connectivity and a NAT
    router to talk to the internet. The <span class="emphasis"><em>Condor connection
    broker</em></span> (CCB) facilitates this setup almost effortlessly.</p>
<p>The left side shows a more difficult setup where the connectivity is
    fully firewalled without any connectivity except to in-site nodes. In this
    case, a proxy server process, the <span class="emphasis"><em> generic connection
    broker</em></span> (GCB), needs to be set up in the DMZ of the cloud site
    to facilitate HTCondor I/O between the submit host and worker
    nodes.</p>
<p>If the cloud supports data storage servers, Pegasus is starting to
    support workflows that require staging in two steps: Consumed data is
    first staged to a data server in the remote site's DMZ, and then a second
    staging task moves the data from the data server to the worker node where
    the job runs. For staging out, data needs to be first staged from the
    job's worker node to the site's data server, and possibly from there to
    another data server external to the site. Pegasus is capable to plan both
    steps: Normal staging to the site's data server, and the worker-node
    staging from and to the site's data server as part of the job.</p>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="amazon_aws"></a>7.3.1. Amazon EC2</h3></div></div></div>
<p>There are many different ways to set up an execution environment
      in Amazon EC2. The easiest way is to use a submit machine outside the
      cloud, and to provision several worker nodes and a file server node in
      the cloud as shown here:</p>
<div class="figure">
<a name="ec2"></a><p class="title"><b>Figure 7.3. Amazon EC2</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" style="cellpadding: 0; cellspacing: 0;" width="100%"><tr><td align="center" valign="middle"><img src="images/ec2.png" align="middle" height="360" alt="Amazon EC2"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>The submit machine runs Pegasus and a HTCondor master (collector,
      schedd, negotiator). The workers run a HTCondor startd. And the file
      server node exports an NFS file system. The startd on the workers is
      configured to connect to the master running outside the cloud, and the
      workers also mount the NFS file system. More information on setting up
      HTCondor for this environment can be found at <a class="ulink" href="http://www.isi.edu/~gideon/condor-ec2/" target="_top">
      http://www.isi.edu/~gideon/condor-ec2</a>.</p>
<p>The site catalog entry for this configuration is similar to what
      you would create for running on a local <a class="link" href="condor_pool.php" title="7.2. Condor Pool">Condor pool</a> with a shared file
      system.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="google_cloud"></a>7.3.2. Google Cloud Platform</h3></div></div></div>
<p>Using the Google Cloud Platform is just like any other cloud
      platform. You can choose to host the central manager / submit host
      inside the cloud or outside. The compute VMs will have HTCondor
      installed and configured to join the pool managed by the central
      manager.</p>
<p>Google Storage is supported using gsutil. First, create a .boto
      file by running:</p>
<pre class="programlisting">gsutil config
</pre>
<p>Then, use a site catalog which specifies which .boto file to use.
      You can then use gs:// URLs in your workflow. Example:</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog
                 http://pegasus.isi.edu/schema/sc-4.0.xsd" version="4.0"&gt;

    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp"&gt;
            &lt;file-server operation="all" url="file:///tmp"/&gt;
        &lt;/directory&gt;
        &lt;profile namespace="env" key="PATH"&gt;/opt/gsutil:/usr/bin:/bin&lt;/profile&gt;                                    
    &lt;/site&gt;                                                                                                                                                                                                                                                                                                                                                                                                             
    &lt;!-- compute site --&gt;
    &lt;site  handle="condorpool" arch="x86_86" os="LINUX"&gt;
        &lt;profile namespace="pegasus" key="style" &gt;condor&lt;/profile&gt;
        &lt;profile namespace="condor" key="universe" &gt;vanilla&lt;/profile&gt;
    &lt;/site&gt;

    &lt;!-- storage sites have to be in the site catalog, just liek a compute site --&gt;
    &lt;site  handle="google_storage" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/my-bucket/scratch"&gt;
            &lt;file-server operation="all" url="gs://my-bucket/scratch"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/my-bucket/outputs"&gt;
            &lt;file-server operation="all" url="gs://my-bucket/outputs"/&gt;
        &lt;/directory&gt;
        &lt;profile namespace="pegasus" key="BOTO_CONFIG"&gt;/home/myuser/.boto&lt;/profile&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
</pre>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="condor_pool.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="execution_environments.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="globus_gram.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">7.2. Condor Pool </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 7.4. Remote Cluster using Globus GRAM</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
