<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="cli-pegasus-version.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="funding_citing_usage.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div lang="" class="chapter" title="Chapter 11. Useful Tips">
<div class="titlepage"><div><div><h2 class="title">
<a name="useful_tips"></a>Chapter 11. Useful Tips</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="useful_tips.php#migrating_from_3x">11.1. Migrating From Pegasus 3.1 to Pegasus 4.X</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#migrating_from_2x">11.2. Migrating From Pegasus 2.X to Pegasus 3.X</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#portable_code">11.3. Best Practices For Developing Portable Code</a></span></dt>
</dl></div>
<div class="section" title="11.1. Migrating From Pegasus 3.1 to Pegasus 4.X">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="migrating_from_3x"></a>11.1. Migrating From Pegasus 3.1 to Pegasus 4.X</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="useful_tips.php#idp5860080">11.1.1. Move to FHS layout</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp27854912">11.1.2. Stampede Schema Upgrade Tool</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp25652896">11.1.3. Existing users running in a condor pool with a non shared
      filesystem setup</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp6727984">11.1.4. New Clients for directory creation and file cleanup</a></span></dt>
</dl></div>
<p>With Pegasus 4.0 effort has been made to move the Pegasus
    installation to be FHS compliant, and to make workflows run better in
    Cloud environments and distributed grid environments. This chapter is for
    existing users of Pegasus who use Pegasus 3.1 to run their workflows and
    walks through the steps to move to using Pegasus 4.0</p>
<div class="section" title="11.1.1. Move to FHS layout">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp5860080"></a>11.1.1. Move to FHS layout</h3></div></div></div>
<p>Pegasus 4.0 is the first release of Pegasus which is <a class="ulink" href="http://www.pathname.com/fhs/" target="_top">Filesystem Hierarchy Standard
      (FHS)</a> compliant. The native packages no longer installs under
      /opt. Instead, pegasus-* binaries are in /usr/bin/ and example workflows
      can be found under /usr/share/pegasus/examples/.</p>
<p>To find Pegasus system components, a pegasus-config tool is
      provided. pegasus-config supports setting up the environment for</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>Python</p></li>
<li class="listitem"><p>Perl</p></li>
<li class="listitem"><p>Java</p></li>
<li class="listitem"><p>Shell</p></li>
</ul></div>
<p>For example, to find the PYTHONPATH for the DAX API, run:</p>
<pre class="programlisting">export PYTHONPATH=`pegasus-config --python`</pre>
<p>For complete description of pegasus-config, see the <a class="link" href="cli-pegasus-config.php" title="pegasus-config">man page</a>.</p>
</div>
<div class="section" title="11.1.2. Stampede Schema Upgrade Tool">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp27854912"></a>11.1.2. Stampede Schema Upgrade Tool</h3></div></div></div>
<p>Starting Pegasus 4.x the monitoring and statistics database schema
      has changed. If you want to use the pegasus-statistics, pegasus-analyzer
      and pegasus-plots against a 3.x database you will need to upgrade the
      schema first using the schema upgrade tool
      /usr/share/pegasus/sql/schema_tool.py or
      /path/to/pegasus-4.x/share/pegasus/sql/schema_tool.py</p>
<p>Upgrading the schema is required for people using the MySQL
      database for storing their monitoring information if it was setup with
      3.x monitoring tools.</p>
<p>If your setup uses the default SQLite database then the new
      databases run with Pegasus 4.x are automatically created with the
      correct schema. In this case you only need to upgrade the SQLite
      database from older runs if you wish to query them with the newer
      clients.</p>
<p>To upgrade the database</p>
<pre class="programlisting">For SQLite Database

<span class="bold"><strong>cd /to/the/workflow/directory/with/3.x.monitord.db</strong></span>

Check the db version<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db</strong></span>
2012-02-29T01:29:43.330476Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init | 
2012-02-29T01:29:43.330708Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 
2012-02-29T01:29:43.348995Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema 
                                   | Current version set to: 3.1. 
2012-02-29T01:29:43.349133Z ERROR  netlogger.analysis.schema.schema_check.SchemaCheck.check_schema 
                                   | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool.


Convert the Database to be version 4.x compliant<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -u connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db
</strong></span>2012-02-29T01:35:35.046317Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init | 
2012-02-29T01:35:35.046554Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 
2012-02-29T01:35:35.064762Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema 
                                  | Current version set to: 3.1. 
2012-02-29T01:35:35.064902Z ERROR  netlogger.analysis.schema.schema_check.SchemaCheck.check_schema 
                                  | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool. 
2012-02-29T01:35:35.065001Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.upgrade_to_4_0 
                                  | Upgrading to schema version 4.0.

Verify if the database has been converted to Version 4.x<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db</strong></span>
2012-02-29T01:39:17.218902Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init | 
2012-02-29T01:39:17.219141Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 
2012-02-29T01:39:17.237492Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Current version set to: 4.0. 
2012-02-29T01:39:17.237624Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Schema up to date. 

For upgrading a MySQL database the steps remain the same. The only thing that changes is the connection String to the database
E.g.<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -u connString=mysql://username:password@server:port/dbname

</strong></span></pre>
<p>After the database has been upgraded you can use either 3.x or 4.x
      clients to query the database with <span class="bold"><strong>pegasus-statistics</strong></span>, as well as <span class="bold"><strong>pegasus-plots </strong></span>and <span class="bold"><strong>pegasus-analyzer.</strong></span></p>
</div>
<div class="section" title="11.1.3. Existing users running in a condor pool with a non shared filesystem setup">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp25652896"></a>11.1.3. Existing users running in a condor pool with a non shared
      filesystem setup</h3></div></div></div>
<p>Existing users that are running workflows in a cloud environment
      with a non shared filesystem setup have to do some trickery in the site
      catalog to include placeholders for local/submit host paths for
      execution sites when using CondorIO. In Pegasus 4.0, this has been
      rectified.</p>
<p>For example, for a 3.1 user, to run on a local-condor pool without
      a shared filesystem and use Condor file IO for file transfers, the site
      entry looks something like this</p>
<pre class="programlisting"> &lt;site  handle="local-condor" arch="x86" os="LINUX"&gt;
        &lt;grid  type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/&gt;
        &lt;grid  type="gt2" contact="localhost/jobmanager-condor" scheduler="unknown" jobtype="compute"/&gt;
        &lt;head-fs&gt;

          <span class="bold"><strong>&lt;!-- the paths for scratch filesystem are the paths on local site as we execute create dir job
               on local site. Improvements planned for 4.0 release.--&gt;</strong></span>
            &lt;scratch&gt;
                &lt;shared&gt;
                    &lt;file-server protocol="file" url="file:///" mount-point="/submit-host/scratch"/&gt;
                    &lt;internal-mount-point mount-point="/submit-host/scratch"/&gt;
                &lt;/shared&gt;
            &lt;/scratch&gt;
            &lt;storage&gt;
                &lt;shared&gt;
                    &lt;file-server protocol="file" url="file:///" mount-point="/glusterfs/scratch"/&gt;
                    &lt;internal-mount-point mount-point="/glusterfs/scratch"/&gt;
                &lt;/shared&gt;
            &lt;/storage&gt;
        &lt;/head-fs&gt;
        &lt;replica-catalog  type="LRC" url="rlsn://dummyValue.url.edu" /&gt;
        &lt;profile namespace="env" key="PEGASUS_HOME" &gt;/cluster-software/pegasus/2.4.1&lt;/profile&gt;
        &lt;profile namespace="env" key="GLOBUS_LOCATION" &gt;/cluster-software/globus/5.0.1&lt;/profile&gt;

        <span class="bold"><strong>&lt;!-- profies for site to be treated as condor pool --&gt;</strong></span>
        &lt;profile namespace="pegasus" key="style" &gt;condor&lt;/profile&gt;
        &lt;profile namespace="condor" key="universe" &gt;vanilla&lt;/profile&gt;

        
        <span class="bold"><strong>&lt;!-- to enable kickstart staging from local site--&gt;</strong></span>
        &lt;profile namespace="condor" key="transfer_executable"&gt;true&lt;/profile&gt;


    &lt;/site&gt;
</pre>
<p>With Pegasus 4.0 the site entry for a local-condor pool can be as
      concise as the following</p>
<pre class="programlisting"> &lt;site  handle="condorpool" arch="x86" os="LINUX"&gt;
        &lt;head-fs&gt;
            &lt;scratch /&gt;
            &lt;storage /&gt;
        &lt;/head-fs&gt;
        &lt;profile namespace="pegasus" key="style" &gt;condor&lt;/profile&gt;
        &lt;profile namespace="condor" key="universe" &gt;vanilla&lt;/profile&gt;
    &lt;/site&gt;
</pre>
<p>The planner in 4.0 correctly picks up the paths from the local
      site entry to determine the staging location for the condor io on the
      submit host.</p>
<p>Users should read pegasus data staging configuration <a class="link" href="running_workflows.php#data_staging_configuration" title="5.3. Data Staging Configuration">chapter</a> and also look in the
      examples directory ( share/pegasus/examples).</p>
</div>
<div class="section" title="11.1.4. New Clients for directory creation and file cleanup">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6727984"></a>11.1.4. New Clients for directory creation and file cleanup</h3></div></div></div>
<p>Pegasus 4.0 has new clients for directory creation and
      cleanup.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>pegasus-create-dir</p></li>
<li class="listitem"><p>pegasus-cleanup</p></li>
</ul></div>
<p>Both these clients are python based wrapper scripts around various
      protocol specific clients that are used to determine what client to pick
      up.</p>
<div class="table">
<a name="idp6048688"></a><p class="title"><b>Table 11.1. Clients interfaced to by pegasus-create-dir</b></p>
<div class="table-contents"><table summary="Clients interfaced to by pegasus-create-dir" border="1">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>Client</th>
<th>Used For</th>
</tr></thead>
<tbody>
<tr>
<td>globus-url-copy</td>
<td>to create directories against a gridftp/ftp
              server</td>
</tr>
<tr>
<td>srm-mkdir</td>
<td>to create directories against a SRM server.</td>
</tr>
<tr>
<td>mkdir</td>
<td>to create a directory on the local filesystem</td>
</tr>
<tr>
<td>pegasus-s3</td>
<td>to create a s3 bucket in the amazon cloud</td>
</tr>
<tr>
<td>scp</td>
<td>staging files using scp</td>
</tr>
<tr>
<td>imkdir</td>
<td>to create a directory against an IRODS server</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="table">
<a name="idp25546416"></a><p class="title"><b>Table 11.2. Clients interfaced to by pegasus-cleanup</b></p>
<div class="table-contents"><table summary="Clients interfaced to by pegasus-cleanup" border="1">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>Client</th>
<th>Used For</th>
</tr></thead>
<tbody>
<tr>
<td>globus-url-copy</td>
<td>to remove a file against a gridftp/ftp server. In this
              case a zero byte file is created</td>
</tr>
<tr>
<td>srm-rm</td>
<td>to remove files against a SRM server.</td>
</tr>
<tr>
<td>rm</td>
<td>to remove a file on the local filesystem</td>
</tr>
<tr>
<td>pegasus-s3</td>
<td>to remove a file from the s3 bucket.</td>
</tr>
<tr>
<td>scp</td>
<td>to remove a file against a scp server. In this case a
              zero byte file is created.</td>
</tr>
<tr>
<td>irm</td>
<td>to remove a file against an IRODS server</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>With Pegasus 4.0, the planner will prefer to run the create dir
      and cleanup jobs locally on the submit host. The only case, where these
      jobs are scheduled to run remotely is when for the staging site, a file
      server is specified.</p>
</div>
</div>
<div class="section" title="11.2. Migrating From Pegasus 2.X to Pegasus 3.X">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="migrating_from_2x"></a>11.2. Migrating From Pegasus 2.X to Pegasus 3.X</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="useful_tips.php#idp5998912">11.2.1. PEGASUS_HOME and Setup Scripts</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp11506304">11.2.2. Changes to Schemas and Catalog Formats</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp23632288">11.2.3. Properties and Profiles Simplification</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp21804000">11.2.4. Transfers Simplification</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp6083568">11.2.5. Clients in bin directory</a></span></dt>
</dl></div>
<p>With Pegasus 3.0 effort has been made to simplify configuration.
    This chapter is for existing users of Pegasus who use Pegasus 2.x to run
    their workflows and walks through the steps to move to using Pegasus
    3.0</p>
<div class="section" title="11.2.1. PEGASUS_HOME and Setup Scripts">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp5998912"></a>11.2.1. PEGASUS_HOME and Setup Scripts</h3></div></div></div>
<p>Earlier versions of Pegasus required users to have the environment
      variable PEGASUS_HOME set and to source a setup file
      $PEGASUS_HOME/setup.sh | $PEGASUS_HOME/setup.csh before running Pegasus
      to setup CLASSPATH and other variables.</p>
<p>Starting with Pegasus 3.0 this is no longer required. The above
      paths are automaticallly determined by the Pegasus tools when they are
      invoked.</p>
<p>All the users need to do is to set the PATH variable to pick up
      the pegasus executables from the bin directory.</p>
<pre class="programlisting">$ <span class="bold"><strong>export PATH=/some/install/pegasus-3.0.0/bin:$PATH</strong></span></pre>
</div>
<div class="section" title="11.2.2. Changes to Schemas and Catalog Formats">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11506304"></a>11.2.2. Changes to Schemas and Catalog Formats</h3></div></div></div>
<div class="section" title="11.2.2.1. DAX Schema">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp11507056"></a>11.2.2.1. DAX Schema</h4></div></div></div>
<p>Pegasus 3.0 by default now parses DAX documents conforming to
        the DAX Schema 3.2 available <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.2/dax-3.2.xsd" target="_top">here</a> and is explained in detail in the chapter on
        API references.</p>
<p>Starting Pegasus 3.0 , DAX generation API's are provided in
        Java/Python and Perl for users to use in their DAX Generators. The use
        of API's is highly encouraged. Support for the old DAX schema's has
        been deprecated and will be removed in a future version.</p>
<p>For users, who still want to run using the old DAX formats i.e
        3.0 or earlier, can for the time being set the following property in
        the properties and point it to dax-3.0 xsd of the installation.</p>
<pre class="programlisting"><span class="bold"><strong>pegasus.schema.dax  /some/install/pegasus-3.0/etc/dax-3.0.xsd</strong></span></pre>
</div>
<div class="section" title="11.2.2.2. Site Catalog Format">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp8819456"></a>11.2.2.2. Site Catalog Format</h4></div></div></div>
<p>Pegasus 3.0 by default now parses Site Catalog format conforming
        to the SC schema 3.0 ( XML3 ) available <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.2/dax-3.2.xsd" target="_top">here</a> and is explained in detail in the chapter on
        Catalogs.</p>
<p>Pegasus 3.0 comes with a pegasus-sc-converter that will convert
        users old site catalog ( XML ) to the XML3 format. Sample usage is
        given below.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-sc-converter -i sample.sites.xml -I XML -o sample.sites.xml3 -O XML3
</strong></span>
2010.11.22 12:55:14.169 PST:   Written out the converted file to sample.sites.xml3 
</pre>
<p>To use the converted site catalog, in the properties do the
        following</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>unset pegasus.catalog.site or set pegasus.catalog.site to
            XML3</p></li>
<li class="listitem"><p>point pegasus.catalog.site.file to the converted site
            catalog</p></li>
</ol></div>
</div>
<div class="section" title="11.2.2.3. Transformation Catalog Format">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp10652608"></a>11.2.2.3. Transformation Catalog Format</h4></div></div></div>
<p>Pegasus 3.0 by default now parses a file based multiline textual
        format of a Transformation Catalog. The new Text format is explained
        in detail in the chapter on Catalogs.</p>
<p>Pegasus 3.0 comes with a pegasus-tc-converter that will convert
        users old transformation catalog ( File ) to the Text format. Sample
        usage is given below.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-tc-converter -i sample.tc.data -I File -o sample.tc.text -O Text
</strong></span>
2010.11.22 12:53:16.661 PST:   Successfully converted Transformation Catalog from File to Text 
2010.11.22 12:53:16.666 PST:   The output transfomation catalog is in file  /lfs1/software/install/pegasus/pegasus-3.0.0cvs/etc/sample.tc.text 
</pre>
<p>To use the converted transformation catalog, in the properties
        do the following</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>unset pegasus.catalog.transformation or set
            pegasus.catalog.transformation to Text</p></li>
<li class="listitem"><p>point pegasus.catalog.transformation.file to the converted
            transformation catalog</p></li>
</ol></div>
</div>
</div>
<div class="section" title="11.2.3. Properties and Profiles Simplification">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp23632288"></a>11.2.3. Properties and Profiles Simplification</h3></div></div></div>
<p>Starting with Pegasus 3.0 all profiles can be specified in the
      properties file. Profiles specified in the properties file have the
      lowest priority. Profiles are explained in the detail in the<a class="link" href="reference.php#profiles" title="10.2. Profiles"> Profiles</a>chapter. As a result of this a lot of
      existing Pegasus Properties were replaced by profiles. The table below
      lists the properties removed and the new profile based names.</p>
<div class="table">
<a name="idp23102784"></a><p class="title"><b>Table 11.3. Table 1: Property Keys removed and their Profile based
        replacement</b></p>
<div class="table-contents"><table summary="Table 1: Property Keys removed and their Profile based
        replacement" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>Old Property Key</strong></span></td>
<td><span class="bold"><strong>New Property Key</strong></span></td>
</tr>
<tr>
<td>pegasus.local.env</td>
<td>no replacement. Specify env profiles for local site in
              the site catalog</td>
</tr>
<tr>
<td>pegasus.condor.release</td>
<td>condor.periodic_release</td>
</tr>
<tr>
<td>pegasus.condor.remove</td>
<td>condor.periodic_remove</td>
</tr>
<tr>
<td>pegasus.job.priority</td>
<td>condor.priority</td>
</tr>
<tr>
<td>pegasus.condor.output.stream</td>
<td>pegasus.condor.output.stream</td>
</tr>
<tr>
<td>pegasus.condor.error.stream</td>
<td>condor.stream_error</td>
</tr>
<tr>
<td>pegasus.dagman.retry</td>
<td>dagman.retry</td>
</tr>
<tr>
<td>pegasus.exitcode.impl</td>
<td>dagman.post</td>
</tr>
<tr>
<td>pegasus.exitcode.scope</td>
<td>dagman.post.scope</td>
</tr>
<tr>
<td>pegasus.exitcode.arguments</td>
<td>dagman.post.arguments</td>
</tr>
<tr>
<td>pegasus.exitcode.path.*</td>
<td>dagman.post.path.*</td>
</tr>
<tr>
<td>pegasus.dagman.maxpre</td>
<td>dagman.maxpre</td>
</tr>
<tr>
<td>pegasus.dagman.maxpost</td>
<td>dagman.maxpost</td>
</tr>
<tr>
<td>pegasus.dagman.maxidle</td>
<td>dagman.maxidle</td>
</tr>
<tr>
<td>pegasus.dagman.maxjobs</td>
<td>dagman.maxjobs</td>
</tr>
<tr>
<td>pegasus.remote.scheduler.min.maxwalltime</td>
<td>globus.maxwalltime</td>
</tr>
<tr>
<td>pegasus.remote.scheduler.min.maxtime</td>
<td>globus.maxtime</td>
</tr>
<tr>
<td>pegasus.remote.scheduler.min.maxcputime</td>
<td>globus.maxcputime</td>
</tr>
<tr>
<td>pegasus.remote.scheduler.queues</td>
<td>globus.queue</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="section" title="11.2.3.1. Profile Keys for Clustering">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp5985760"></a>11.2.3.1. Profile Keys for Clustering</h4></div></div></div>
<p>The pegasus profile keys for job clustering were <span class="bold"><strong>renamed</strong></span>. The following table lists the old and
        the new names for the profile keys.</p>
<div class="table">
<a name="idp5987504"></a><p class="title"><b>Table 11.4. Table 2: Old and New Names For Job Clustering Profile
          Keys</b></p>
<div class="table-contents"><table summary="Table 2: Old and New Names For Job Clustering Profile
          Keys" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>Old Pegasus Profile
                Key</strong></span></td>
<td><span class="bold"><strong>New Pegasus Profile
                Key</strong></span></td>
</tr>
<tr>
<td>collapse</td>
<td>clusters.size</td>
</tr>
<tr>
<td>bundle</td>
<td>clusters.num</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
</div>
<div class="section" title="11.2.4. Transfers Simplification">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp21804000"></a>11.2.4. Transfers Simplification</h3></div></div></div>
<p>Pegasus 3.0 has a new default transfer client pegasus-transfer
      that is invoked by default for first level and second level staging. The
      pegasus-transfer client is a python based wrapper around various
      transfer clients like globus-url-copy, lcg-copy, wget, cp, ln .
      pegasus-transfer looks at source and destination url and figures out
      automatically which underlying client to use. pegasus-transfer is
      distributed with the PEGASUS and can be found in the bin subdirectory
      .</p>
<p>Also, the Bundle Transfer refiner has been made the default for
      pegasus 3.0. Most of the users no longer need to set any transfer
      related properties. The names of the profiles keys that control the
      Bundle Transfers have been changed . The following table lists the old
      and the new names for the Pegasus Profile Keys and are explained in
      details in the Profiles Chapter.</p>
<div class="table">
<a name="idp27765456"></a><p class="title"><b>Table 11.5. Table 3: Old and New Names For Transfer Bundling Profile
        Keys</b></p>
<div class="table-contents"><table summary="Table 3: Old and New Names For Transfer Bundling Profile
        Keys" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>Old Pegasus Profile
              Key</strong></span></td>
<td><span class="bold"><strong>New Pegasus Profile
              Keys</strong></span></td>
</tr>
<tr>
<td>bundle.stagein</td>
<td>stagein.clusters | stagein.local.clusters |
              stagein.remote.clusters</td>
</tr>
<tr>
<td>bundle.stageout</td>
<td>stageout.clusters | stageout.local.clusters |
              stageout.remote.clusters</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="section" title="11.2.4.1. Worker Package Staging">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp27907408"></a>11.2.4.1. Worker Package Staging</h4></div></div></div>
<p>Starting Pegasus 3.0 there is a separate boolean property
        <span class="bold"><strong>pegasus.transfer.worker.package</strong></span> to
        enable worker package staging to the remote compute sites. Earlier it
        was bundled with user executables staging i.e if <span class="bold"><strong>pegasus.catalog.transformation.mapper</strong></span> property
        was set to Staged .</p>
</div>
</div>
<div class="section" title="11.2.5. Clients in bin directory">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6083568"></a>11.2.5. Clients in bin directory</h3></div></div></div>
<p>Starting with Pegasus 3.0 the pegasus clients in the bin directory
      have a pegasus prefix. The table below lists the old client names and
      new names for the clients that replaced them</p>
<div class="table">
<a name="idp6126400"></a><p class="title"><b>Table 11.6. Table 1: Old Client Names and their New Names</b></p>
<div class="table-contents"><table summary="Table 1: Old Client Names and their New Names" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>Old Client</strong></span></td>
<td><span class="bold"><strong>New Client</strong></span></td>
</tr>
<tr>
<td>rc-client</td>
<td>pegasus-rc-client</td>
</tr>
<tr>
<td>tc-client</td>
<td>pegasus-tc-client</td>
</tr>
<tr>
<td>pegasus-get-sites</td>
<td>pegasus-sc-client</td>
</tr>
<tr>
<td>sc-client</td>
<td>pegasus-sc-converter</td>
</tr>
<tr>
<td>tailstatd</td>
<td>pegasus-monitord</td>
</tr>
<tr>
<td>genstats and genstats-breakdown</td>
<td>pegasus-statistics</td>
</tr>
<tr>
<td>show-job</td>
<td>pegasus-plots</td>
</tr>
<tr>
<td>cleanup</td>
<td>pegasus-cleanup</td>
</tr>
<tr>
<td>dirmanager</td>
<td>pegasus-dirmanager</td>
</tr>
<tr>
<td>exitcode</td>
<td>pegasus-exitcode</td>
</tr>
<tr>
<td>rank-dax</td>
<td>pegasus-rank-dax</td>
</tr>
<tr>
<td>transfer</td>
<td>pegasus-transfer</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
</div>
<div class="section" title="11.3. Best Practices For Developing Portable Code">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="portable_code"></a>11.3. Best Practices For Developing Portable Code</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="useful_tips.php#idp21614112">11.3.1. Supported Platforms</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp30253664">11.3.2. Packaging of Software</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp21450768">11.3.3. MPI Codes</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp24713824">11.3.4. Maximum Running Time of Codes</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp24715584">11.3.5. Codes cannot specify the directory in which they should be
      run</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp25749568">11.3.6. No hard-coded paths</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp6071024">11.3.7. Wrapping legacy codes with a shell wrapper</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp6072688">11.3.8. Propogating back the right exitcode</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp8998848">11.3.9. Static vs. Dynamically Linked Libraries</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp10965504">11.3.10. Temporary Files</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp10211120">11.3.11. Handling of stdio</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp8028304">11.3.12. Configuration Files</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp10985248">11.3.13. Code Invocation and input data staging by Pegasus</a></span></dt>
<dt><span class="section"><a href="useful_tips.php#idp7070416">11.3.14. Logical File naming in DAX</a></span></dt>
</dl></div>
<p>This document lists out issues for the algorithm developers to keep
    in mind while developing the respective codes. Keeping these in mind will
    alleviate a lot of problems while trying to run the codes on the Grid
    through workflows.</p>
<div class="section" title="11.3.1. Supported Platforms">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp21614112"></a>11.3.1. Supported Platforms</h3></div></div></div>
<p>Most of the hosts making a Grid run variants of Linux or in some
      case Solaris. The Grid middleware mostly supports UNIX and it's
      variants.</p>
<div class="section" title="11.3.1.1. Running on Windows">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp21988464"></a>11.3.1.1. Running on Windows</h4></div></div></div>
<p>The majority of the machines making up the various Grid sites
        run Linux. In fact, there is no widespread deployment of a
        Windows-based Grid. Currently, the server side software of Globus does
        not run on Windows. Only the client tools can run on Windows. The
        algorithm developers should not code exclusively for the Windows
        platforms. They must make sure that their codes run on Linux or
        Solaris platforms. If the code is written in a portable language like
        Java, then porting should not be an issue.</p>
<p>If for some reason the code can only be executed on windows
        platform, please contact the pegasus team at pegasus aT isi dot edu .
        In certain cases it is possible to stand up a linux headnode in front
        of a windows cluster running Condor as it's scheduler.</p>
</div>
</div>
<div class="section" title="11.3.2. Packaging of Software">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp30253664"></a>11.3.2. Packaging of Software</h3></div></div></div>
<p>As far as possible, binary packages (preferably statically linked)
      of the codes should be provided. If for some reason the codes, need to
      be built from the source then they should have an associated makefile (
      for C/C++ based tools) or an ant file ( for Java tools). The building
      process should refer to the standard libraries that are part of a normal
      Linux installation. If the codes require non-standard libraries, clear
      documentation needs to be provided, as to how to install those
      libraries, and make the build process refer to those libraries.</p>
<p>Further, installing software as root is not a possibility. Hence,
      all the external libraries that need to be installed can only be
      installed as non-root in non-standard locations.</p>
</div>
<div class="section" title="11.3.3. MPI Codes">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp21450768"></a>11.3.3. MPI Codes</h3></div></div></div>
<p>If any of the algorithm codes are MPI based, they should contact
      the Grid group. MPI can be run on the Grid but the codes need to be
      compiled against the installed MPI libraries on the various Grid sites.
      The pegasus group has some experience running MPI code through
      PBS.</p>
</div>
<div class="section" title="11.3.4. Maximum Running Time of Codes">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp24713824"></a>11.3.4. Maximum Running Time of Codes</h3></div></div></div>
<p>Each of the Grid sites has a policy on the maximum time for which
      they will allow a job to run. The algorithms catalog should have the
      maximum time (in minutes) that the job can run for. This information is
      passed to the Grid sites while submitting a job, so that Grid site does
      not kill a job before that published time expires. It is OK, if the job
      runs only a fraction of the max time.</p>
</div>
<div class="section" title="11.3.5. Codes cannot specify the directory in which they should be run">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp24715584"></a>11.3.5. Codes cannot specify the directory in which they should be
      run</h3></div></div></div>
<p>Codes are installed in some standard location on the Grid Sites or
      staged on demand. However, they are not invoked from directories where
      they are installed. The codes should be able to be invoked from any
      directory, as long as one can access the directory where the codes are
      installed.</p>
<p>This is especially relevant, while writing scripts around the
      algorithm codes. At that point specifying the relative paths do not
      work. This is because the relative path is constructed from the
      directory where the script is being invoked. A suggested workaround is
      to pick up the base directory where the software is installed from the
      environment or by using the <span class="command"><strong>dirname</strong></span> cmd or api. The
      workflow system can set appropriate environment variables while
      launching jobs on the Grid.</p>
</div>
<div class="section" title="11.3.6. No hard-coded paths">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp25749568"></a>11.3.6. No hard-coded paths</h3></div></div></div>
<p>The algorithms should not hard-code any directory paths in the
      code. All directories paths should be picked up explicitly either from
      the environment (specifying environment variables) or from command line
      options passed to the algorithm code.</p>
</div>
<div class="section" title="11.3.7. Wrapping legacy codes with a shell wrapper">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6071024"></a>11.3.7. Wrapping legacy codes with a shell wrapper</h3></div></div></div>
<p>When wrapping a legacy code in a script (or another program), it
      is necessary that the wrapper knows where the executable lives. This is
      accomplished using an environmental variable. Be sure to include this
      detail in the component description when submitting a component for use
      on the Grid -- include a brief descriptive name like GDA_BIN.</p>
</div>
<div class="section" title="11.3.8. Propogating back the right exitcode">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6072688"></a>11.3.8. Propogating back the right exitcode</h3></div></div></div>
<p>A job in the workflow is only released for execution if its
      parents have executed successfully. Hence, it is very important that the
      algorithm codes exit with the correct error code in case of success and
      failure. The algorithms should exit with a status of 0 in case of
      success, and a non zero status in case of error. Failure to do so will
      result in erroneous workflow execution where jobs might be released for
      execution even though their parents had exited with an error.</p>
<p>The algorithm codes should catch all errors and exit with a non
      zero exitcode. The successful execution of the algorithm code can only
      be determined by an exitcode of 0. The algorithm code should not rely
      upon something being written to the stdout to designate success for e.g.
      if the algorithm code writes out to the stdout SUCCESS and exits with a
      non zero status the job would be marked as failed.</p>
<p>In *nix, a quick way to see if a code is exiting with the correct
      code is to execute the code and then execute echo $?.</p>
<pre class="programlisting">$ component-x input-file.lisp
... some output ...
$ echo $?
0</pre>
<p>If the code is not exiting correctly, it is necessary to wrap the
      code in a script that tests some final condition (such as the presence
      or format of a result file) and uses exit to return correctly.</p>
</div>
<div class="section" title="11.3.9. Static vs. Dynamically Linked Libraries">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp8998848"></a>11.3.9. Static vs. Dynamically Linked Libraries</h3></div></div></div>
<p>Since there is no way to know the profile of the machine that will
      be executing the code, it is important that dynamically linked libraries
      are avoided or that reliance on them is kept to a minimum. For example,
      a component that requires libc 2.5 may or may not run on a machine that
      uses libc 2.3. On *nix, you can use the <span class="command"><strong>ldd</strong></span> command
      to see what libraries a binary depends on.</p>
<p>If for some reason you install an algorithm specific library in a
      non standard location make sure to set the
      <code class="envar">LD_LIBRARY_PATH</code> for the algorithm in the transformation
      catalog for each site.</p>
</div>
<div class="section" title="11.3.10. Temporary Files">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp10965504"></a>11.3.10. Temporary Files</h3></div></div></div>
<p>If the algorithm codes create temporary files during execution,
      they should be cleared by the codes in case of errors and success
      terminations. The algorithm codes will run on scratch file systems that
      will also be used by others. The scratch directories get filled up very
      easily, and jobs will fail in case of directories running out of free
      space. The temporary files are the files that are not being tracked
      explicitly through the workflow generation process.</p>
</div>
<div class="section" title="11.3.11. Handling of stdio">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp10211120"></a>11.3.11. Handling of stdio</h3></div></div></div>
<p>When writing a new application, it often appears feasible to use
      <span class="emphasis"><em>stdin</em></span> for a single file data, and
      <span class="emphasis"><em>stdout</em></span> for a single file output data. The
      <span class="emphasis"><em>stderr</em></span> descriptor should be used for logging and
      debugging purposes only, never to put data on it. In the *nix world,
      this will work well, but may hiccup in the Windows world.</p>
<p>We are suggesting that you avoid using stdio for data files,
      because there is the implied expectation that stdio data gets magically
      handled. There is no magic! If you produce data on
      <span class="emphasis"><em>stdout</em></span>, you need to declare to Pegasus that your
      <span class="emphasis"><em>stdout</em></span> has your data, and what LFN Pegasus can
      track it by. After the application is done, the data product will be a
      remote file just like all other data products. If you have an input file
      on <span class="emphasis"><em>stdin</em></span>, you must track it in a similar manner. If
      you produce logs on <span class="emphasis"><em>stderr</em></span> that you care about, you
      must track it in a similar manner. Think about it this way: Whenever you
      are redirecting stdio in a *nix shell, you will also have to specify a
      file name.</p>
<p>Most execution environments permit to connect
      <span class="emphasis"><em>stdin</em></span>, <span class="emphasis"><em>stdout</em></span> or
      <span class="emphasis"><em>stderr</em></span> to any file, and Pegasus supports this case.
      However, there are certain very specific corner cases where this is not
      possible. For this reason, we recommend that in new code, you avoid
      using stdio for data, and provide alternative means on the commandline,
      i.e. via <span class="command"><strong>--input <em class="replaceable"><code>fn</code></em></strong></span> and
      <span class="command"><strong>--output <em class="replaceable"><code>fn</code></em></strong></span> commandline
      arguments instead relying on <span class="emphasis"><em>stdin</em></span> and
      <span class="emphasis"><em>stdout</em></span>.</p>
</div>
<div class="section" title="11.3.12. Configuration Files">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp8028304"></a>11.3.12. Configuration Files</h3></div></div></div>
<p>If your code requires a configuration file to run and the
      configuration changes from one run to another, then this file needs to
      be tracked explicitly via the Pegasus WMS. The configuration file should
      not contain any absolute paths to any data or libraries used by the
      code. If any libraries, scripts etc need to be referenced they should
      refer to relative paths starting with a <code class="filename">./xyz</code> where
      <code class="filename">xyz</code> is a tracked file (defined in the workflow) or
      as $ENV-VAR/xyz where <code class="envar">$ENV-VAR</code> is set during execution
      time and evaluated by your application code internally.</p>
</div>
<div class="section" title="11.3.13. Code Invocation and input data staging by Pegasus">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp10985248"></a>11.3.13. Code Invocation and input data staging by Pegasus</h3></div></div></div>
<p>Pegasus will create one temporary directory per workflow on each
      site where the workflow is planned. Pegasus will stage all the files
      required for the execution of the workflow in these temporary
      directories. This directory is shared by all the workflow components
      that executed on the site. You will have no control over where this
      directory is placed and as such you should have no expectations about
      where the code will be run. The directories are created per workflow and
      not per job/alogrithm/task. Suppose there is a component component-x
      that takes one argument: input-file.lisp (a file containing the data to
      be operated on). The staging step will bring input-file.lisp to the
      temporary directory. In *nix the call would look like this:</p>
<pre class="programlisting">$ /nfs/software/component-x input-file.lisp</pre>
<p>Note that Pegasus will call the component using the full path to
      the component. If inside your code/script you invoke some other code you
      cannot assume a path for this code to be relative or absolute. You have
      to resovle it either using a dirname $0 trick in shell assuming the
      child code is in the same directory as the parent or construct the path
      by expecting an enviornment variable to be set by the workflow system.
      These env variables need to be explicitly published so that they can be
      stored in the transformation catalog.</p>
<p>Now suppose that internally, component-x writes its results to
      /tmp/component-x-results.lisp. This is not good. Components should not
      expect that a /tmp directory exists or that it will have permission to
      write there. Instead, component-x should do one of two things: 1. write
      component-x-results.lisp to the directory where it is run from or 2.
      component-x should take a second argument output-file.lisp that
      specifies the name and path of where the results should be
      written.</p>
</div>
<div class="section" title="11.3.14. Logical File naming in DAX">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp7070416"></a>11.3.14. Logical File naming in DAX</h3></div></div></div>
<p>The logical file names used by your code can be of two
      types.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>Without a directory path e.g. <code class="filename">f.a</code>,
          <code class="filename">f.b</code> etc</p></li>
<li class="listitem"><p>With a directory path e.g. <code class="filename">a/1/f.a</code>,
          <code class="filename">b/2/f.b</code></p></li>
</ul></div>
<p>Both types of files are supported. We will create any directory
      structure mentioned in your logical files on the remote execution site
      when we stage in data as well as when we store the output data to a
      permanent location. An example invocation of a code that consumes and
      produces files will be</p>
<pre class="programlisting">$/bin/test --input f.a --output f.b</pre>
<p>OR</p>
<pre class="programlisting">$/bin/test --input a/1/f.a --output b/1/f.b</pre>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>A logical file name should never be an absolute file path, e.g.
        /a/1/f.a In other words, there should not be a starting slash (/) in a
        logical filename.</p>
</div>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="cli-pegasus-version.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="funding_citing_usage.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">pegasus-version </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 12. Funding, citing, and anonymous usage statistics</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
