<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="ch02s03.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="ch02s05.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section" title="2.4. Information Catalogs">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="idp11297632"></a>2.4. Information Catalogs</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="ch02s04.php#tut_site_catalog">2.4.1. The Site Catalog</a></span></dt>
<dt><span class="section"><a href="ch02s04.php#idp11065456">2.4.2. The Transformation Catalog</a></span></dt>
<dt><span class="section"><a href="ch02s04.php#idp11073184">2.4.3. The Replica Catalog</a></span></dt>
</dl></div>
<p>There are three information catalogs that Pegasus uses when planning
    the workflow. These are the <a class="link" href="ch02s04.php#tut_site_catalog" title="2.4.1. The Site Catalog">Site
    Catalog</a>, <a class="link" href="ch02s04.php#tut_xform_catalog">Transformation
    Catalog</a>, and <a class="link" href="ch02s04.php#tut_replica_catalog">Replica
    Catalog</a>.</p>
<div class="section" title="2.4.1. The Site Catalog">
<div class="titlepage"><div><div><h3 class="title">
<a name="tut_site_catalog"></a>2.4.1. The Site Catalog</h3></div></div></div>
<p>The site catalog describes the sites where the workflow jobs are
      to be executed. Typically the sites in the site catalog describe remote
      clusters, such as PBS clusters or Condor pools. In this tutorial we
      assume that you have a Personal Condor pool running on localhost. If you
      are using one of the tutorial VMs this has already been setup for
      you.</p>
<p>The site catalog is in <code class="filename">sites.xml</code>:</p>
<pre class="programlisting">$ <span class="bold"><strong>more sites.xml</strong></span>
...
﻿    ﻿&lt;!-- The local site contains information about the submit host --&gt;
    &lt;!-- The arch and os keywords are used to match binaries in the transformation catalog --&gt;
    &lt;site handle="local" arch="x86_64" os="LINUX"&gt;

        &lt;!-- These are the paths on the submit host were Pegasus stores data --&gt;
        &lt;!-- Scratch is where temporary files go --&gt;
        &lt;directory type="shared-scratch" path="/home/tutorial/run"&gt;
            &lt;file-server operation="all" url="file:///home/tutorial/run"/&gt;
        &lt;/directory&gt;
        &lt;!-- Storage is where pegasus stores output files --&gt;
        &lt;directory type="local-storage" path="/home/tutorial/outputs"&gt;
            &lt;file-server operation="all" url="file:///home/tutorial/outputs"/&gt;
        &lt;/directory&gt;

        &lt;!-- This profile tells Pegasus where to find the user's private key for SCP transfers --&gt;
        &lt;profile namespace="env" key="SSH_PRIVATE_KEY"&gt;/home/tutorial/.ssh/id_rsa&lt;/profile&gt;
    &lt;/site&gt;


...</pre>
<p>There are two sites defined in the site catalog: “local” and
      “PegasusVM”. The “local” site is used by Pegasus to learn about the
      submit host where the workflow management system runs. The “PegasusVM”
      site is the personal Condor pool running on your (virtual) machine. In
      this case, the local site and the PegasusVM site refer to the same
      machine, but they are logically separate as far as Pegasus is
      concerned.</p>
<p>The local site is configured with a “storage” file system that is
      mounted on the submit host (indicated by the file:// URL). This file
      system is where the output data from the workflow will be stored. When
      the workflow is planned we will tell Pegasus that the output site is
      “local”.</p>
<p>The PegasusVM site is configured with a “scratch” file system
      accessible via SCP (indicated by the scp:// URL). This file system is
      where the working directory will be created. When we plan the workflow
      we will tell Pegasus that the execution site is “PegasusVM”.</p>
<p>The local site also has an environment variable called
      SSH_PRIVATE_KEY that tells Pegasus where to find the private key to use
      for SCP transfers. If you are running this tutorial on your own machine
      you will need to set up a passwordless ssh key and add it to
      authorized_keys. If you are using the tutorial VM this has already been
      set up for you.</p>
<p>Pegasus supports many different file transfer protocols. In this
      case the site catalog is set up so that input and output files are
      transferred to/from the PegasusVM site using SCP. Since both the local
      site and the PegasusVM site are actually the same machine, this
      configuration will just SCP files to/from localhost, which is just a
      complicated way to copy the files.</p>
<p>Finally, the PegasusVM site is configured with two profiles that
      tell Pegasus that it is a plain Condor pool. Pegasus supports many ways
      of submitting tasks to a remote cluster. In this configuration it will
      submit vanilla Condor jobs.</p>
</div>
<div class="section" title="2.4.2. The Transformation Catalog">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11065456"></a>2.4.2. The Transformation Catalog</h3></div></div></div>
<p>The transformation catalog describes all of the executables
      (called “transformations”) used by the workflow. This description
      includes the site(s) where they are located, the architecture and
      operating system they are compiled for, and any other information
      required to properly transfer them to the execution site and run
      them.</p>
<p>For this tutorial, the transformation catalog is in the file
      <code class="filename">tc.dat</code>:</p>
<pre class="programlisting">$ <span class="bold"><strong>more tc.dat</strong></span>
...
﻿# This is the transformation catalog. It lists information about each of the
# executables that are used by the workflow.

tr preprocess {
    site PegasusVM {
        pfn "/home/tutorial/bin/preprocess"
        arch "x86_64"
        os "linux"
        type "INSTALLED"
    }
}


...</pre>
<p>The <code class="filename">tc.dat</code> file contains information about
      three transformations: preprocess, findrange, and analyze. These three
      transformations are referenced in the diamond DAX. The transformation
      catalog indicates that all three transformations are installed on the
      PegasusVM site, and are compiled for x86_64 Linux.</p>
<p>The actual executable files are located in the
      <code class="filename">bin</code> directory. All three executables are actually
      symlinked to the same Python script. This script is just an example
      transformation that sleeps for 30 seconds, and then writes its own name
      and the contents of all its input files to all of its output
      files.</p>
</div>
<div class="section" title="2.4.3. The Replica Catalog">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11073184"></a>2.4.3. The Replica Catalog</h3></div></div></div>
<p>The final catalog is the Replica Catalog. This catalog tells
      Pegasus where to find each of the input files for the workflow.</p>
<p>All files in a Pegasus workflow are referred to in the DAX using
      their Logical File Name (LFN). These LFNs are mapped to Physical File
      Names (PFNs) when Pegasus plans the workflow. This level of indirection
      enables Pegasus to map abstract DAXes to different execution sites and
      plan out the required file transfers automatically.</p>
<p>The Replica Catalog for the diamond workflow is in the
      <code class="filename">rc.dat</code> file:</p>
<pre class="programlisting">$ <span class="bold"><strong>more rc.dat</strong></span>
# This is the replica catalog. It lists information about each of the
# input files used by the workflow.

# The format is:
# LFN     PFN    site="SITE"

f.a    file:///home/tutorial/input/f.a    site="local"</pre>
<p>This replica catalog contains only one entry for the diamond
      workflow’s only input file. This entry has an LFN of “f.a” with a PFN of
      “file:///home/tutorial/input/f.a” and the file is stored on the local
      site, which implies that it will need to be transferred to the PegasusVM
      site when the workflow runs. The Replica Catalog uses the keyword "pool"
      to refer to the site. Don't be confused by this: the value of the pool
      variable should be the name of the site where the file is located from
      the Site Catalog.</p>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="ch02s03.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="tutorial.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="ch02s05.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">2.3. Generating the Workflow </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 2.5. Configuring Pegasus</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
