<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="tarballs.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="replica.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter">
<div class="titlepage"><div><div><h1 class="title">
<a name="creating_workflows"></a>Chapter 4. Creating Workflows</h1></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="creating_workflows.php#abstract_workflows">4.1. Abstract Workflows (DAX)</a></span></dt>
<dt><span class="section"><a href="replica.php">4.2. Data Discovery (Replica Catalog)</a></span></dt>
<dt><span class="section"><a href="site.php">4.3. Resource Discovery (Site Catalog)</a></span></dt>
<dt><span class="section"><a href="transformation.php">4.4. Executable Discovery (Transformation Catalog)</a></span></dt>
</dl></div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="abstract_workflows"></a>4.1. Abstract Workflows (DAX)</h2></div></div></div>
<p>The DAX is a description of an abstract workflow in XML format that
    is used as the primary input into Pegasus. The DAX schema is described in
    <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.4/dax-3.4.xsd" target="_top">dax-3.4.xsd</a>
    The documentation of the schema and its elements can be found in <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.4/dax-3.4.html" target="_top">dax-3.4.html</a>.</p>
<p>A DAX can be created by all users with the DAX generating API in
    Java, Perl, or Python format</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
       We highly recommend using the DAX API. 
    </div>
<p>Advanced users who can read XML schema definitions can generate a
    DAX directly from a script</p>
<p>The sample workflow below incorporates some of the elementary graph
    structures used in all abstract workflows.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>fan-out</strong></span>, <span class="bold"><strong>scatter</strong></span>, and <span class="bold"><strong>diverge</strong></span> all describe the fact that multiple
        siblings are dependent on fewer parents.</p>
<p>The example shows how the <span class="bold"><strong> Job 2 and
        3</strong></span> nodes depend on <span class="bold"><strong>Job 1</strong></span>
        node.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>fan-in</strong></span>, <span class="bold"><strong>gather</strong></span>, <span class="bold"><strong>join</strong></span>,
        and <span class="bold"><strong>converge</strong></span> describe how multiple
        siblings are merged into fewer dependent child nodes.</p>
<p>The example shows how the <span class="bold"><strong>Job 4</strong></span>
        node depends on both <span class="bold"><strong>Job 2 and Job 3</strong></span>
        nodes.</p>
</li>
</ul></div>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p><span class="bold"><strong>serial execution</strong></span> implies that
        nodes are dependent on one another, like pearls on a string.</p></li>
<li class="listitem"><p><span class="bold"><strong>parallel execution</strong></span> implies that
        nodes can be executed in parallel</p></li>
</ul></div>
<div class="figure">
<a name="components_blackdiamond"></a><p class="title"><b>Figure 4.1. Sample Workflow</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" style="cellpadding: 0; cellspacing: 0;"><tr><td align="center" valign="middle"><img src="images/DiamondWorkflow.png" align="middle" alt="Sample Workflow"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>The example diamond workflow consists of four nodes representing
    jobs, and are linked by six files.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>Required input files must be registered with the Replica catalog
        in order for Pegasus to find it and integrate it into the
        workflow.</p></li>
<li class="listitem"><p>Leaf files are a product or output of a workflow. Output files
        can be collected at a location.</p></li>
<li class="listitem"><p>The remaining files all have lines leading to them and
        originating from them. These files are products of some job steps
        (lines leading to them), and consumed by other job steps (lines
        leading out of them). Often, these files represent intermediary
        results that can be cleaned.</p></li>
</ul></div>
<p>There are two main ways of generating DAX's</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p>Using a DAX generating API in <a class="link" href="dax_generator_api.php#api-java" title="14.2.1. The Java DAX Generator API">Java</a>, <a class="link" href="dax_generator_api.php#api-perl" title="14.2.3. The Perl DAX Generator">Perl</a>
        or <a class="link" href="dax_generator_api.php#api-python" title="14.2.2. The Python DAX Generator API">Python</a>.</p>
<p><span class="bold"><strong>Note:</strong></span> We recommend this
        option.</p>
</li>
<li class="listitem">
<p>Generating XML directly from your script.</p>
<p><span class="bold"><strong>Note:</strong></span> This option should only
        be considered by advanced users who can also read XML schema
        definitions.</p>
</li>
</ol></div>
<p>One example for a DAX representing the example workflow can look
    like the following:</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated: 2010-11-22T22:55:08Z --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.2.xsd"
      version="3.2" name="diamond" index="0" count="1"&gt;
  &lt;!-- part 2: definition of all jobs (at least one) --&gt;
  &lt;job namespace="diamond" name="preprocess" version="2.0" id="ID000001"&gt;
    &lt;argument&gt;-a preprocess -T60 -i &lt;file name="f.a" /&gt; -o &lt;file name="f.b1" /&gt; &lt;file name="f.b2" /&gt;&lt;/argument&gt;
    &lt;uses name="f.b2" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.b1" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.a" link="input" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="findrange" version="2.0" id="ID000002"&gt;
    &lt;argument&gt;-a findrange -T60 -i &lt;file name="f.b1" /&gt; -o &lt;file name="f.c1" /&gt;&lt;/argument&gt;
    &lt;uses name="f.b1" link="input" register="false" transfer="false" /&gt;
    &lt;uses name="f.c1" link="output" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="findrange" version="2.0" id="ID000003"&gt;
    &lt;argument&gt;-a findrange -T60 -i &lt;file name="f.b2" /&gt; -o &lt;file name="f.c2" /&gt;&lt;/argument&gt;
    &lt;uses name="f.c2" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.b2" link="input" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="analyze" version="2.0" id="ID000004"&gt;
    &lt;argument&gt;-a analyze -T60 -i &lt;file name="f.c1" /&gt; &lt;file name="f.c2" /&gt; -o &lt;file name="f.d" /&gt;&lt;/argument&gt;
    &lt;uses name="f.c2" link="input" register="false" transfer="false" /&gt;
    &lt;uses name="f.d" link="output" register="false" transfer="true" /&gt;
    &lt;uses name="f.c1" link="input" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;!-- part 3: list of control-flow dependencies --&gt;
  &lt;child ref="ID000002"&gt;
    &lt;parent ref="ID000001" /&gt;
  &lt;/child&gt;
  &lt;child ref="ID000003"&gt;
    &lt;parent ref="ID000001" /&gt;
  &lt;/child&gt;
  &lt;child ref="ID000004"&gt;
    &lt;parent ref="ID000002" /&gt;
    &lt;parent ref="ID000003" /&gt;
  &lt;/child&gt;
&lt;/adag&gt;</pre>
<p>The example workflow representation in form of a DAX requires
    external catalogs, such as transformation catalog (TC) to resolve the
    logical job names (such as diamond::preprocess:2.0), and a replica catalog
    (RC) to resolve the input file <code class="filename">f.a</code>. The above
    workflow defines the four jobs just like the example picture, and the
    files that flow between the jobs. The intermediary files are neither
    registered nor staged out, and can be considered transient. Only the final
    result file <code class="filename">f.d</code> is staged out.</p>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="tarballs.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="replica.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">3.5. Pegasus from Tarballs </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 4.2. Data Discovery (Replica Catalog)</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
