<?php  
            require('/srv/new-pegasus.isi.edu/includes/common.php'); 
            pegasus_header("10.7. Metadata");
        ?><div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.php">Pegasus 4.8.0 User Guide</a></span> &gt; <span class="breadcrumb-link"><a href="data_management.php">Data Management</a></span> &gt; <span class="breadcrumb-node">Metadata</span>
</div><hr><div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="metadata"></a>10.7. Metadata</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="metadata.php#metadata_dax">10.7.1. Metadata in the DAX</a></span></dt>
<dt><span class="section"><a href="metadata.php#metadata_wf">10.7.2. Workflow Level Metadata</a></span></dt>
<dt><span class="section"><a href="metadata.php#metadata_task">10.7.3. Task Level Metadata</a></span></dt>
<dt><span class="section"><a href="metadata.php#metadata_file">10.7.4. File Level Metadata</a></span></dt>
<dt><span class="section"><a href="metadata.php#metadata_auto">10.7.5. Automatically Generated Metadata attributes</a></span></dt>
<dt><span class="section"><a href="metadata.php#metadata_trace">10.7.6. Tracing Metadata for an output file</a></span></dt>
</dl></div>
<p>Pegasus allows users to associate metadata at</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>Workflow Level in the DAX</p></li>
<li class="listitem"><p>Task level in the DAX and the Transformation Catalog</p></li>
<li class="listitem"><p>File level in the DAX and Replica Catalog</p></li>
</ul></div>
<p>Metadata is specified as a key value tuple, where both key and
    values are of type String.</p>
<p>All the metadata ( user specified and auto-generated) gets populated
    into the workflow database ( usually in the workflow submit directory) by
    pegasus-monitord. The metadata in this database can be be queried for
    using the <span class="bold"><strong><a class="link" href="cli-pegasus-metadata.php" title="pegasus-metadata">pegasus-metadata</a></strong></span> command
    line tool, or is also shown in the <a class="link" href="dashboard.php" title="6.3. Dashboard">Pegasus
    Dashboard</a>.</p>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_dax"></a>10.7.1. Metadata in the DAX</h3></div></div></div>
<p>In the DAX, metadata can be associated with the workflow, tasks,
      files and executables. For details on how to associate metadata in the
      DAX using the DAX API refer to the DAX API <a class="link" href="dax_generator_api.php" title="16.2. DAX Generator API">chapter</a>. Below is an example DAX that
      illustrates metadata associations at workflow, task and file
      level.</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

&lt;!-- Section 1: Metadata attributes for the workflow (can be empty)  --&gt;

   &lt;metadata key="name"&gt;diamond&lt;/metadata&gt;
   &lt;metadata key="createdBy"&gt;Karan Vahi&lt;/metadata&gt;

&lt;!-- Section 2: Invokes - Adds notifications for a workflow (can be empty) --&gt;

   &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;

&lt;!-- Section 3: Files - Acts as a Replica Catalog (can be empty) --&gt;

   &lt;file name="f.a"&gt;
      &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
      &lt;pfn url="file:///Volumes/Work/lfs1/work/pegasus-features/PM-902/f.a" site="local"/&gt;
   &lt;/file&gt;

&lt;!-- Section 4: Executables - Acts as a Transformaton Catalog (can be empty) --&gt;

   &lt;executable namespace="pegasus" name="preprocess" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;metadata key="size"&gt;2048&lt;/metadata&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;
   &lt;executable namespace="pegasus" name="findrange" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;
   &lt;executable namespace="pegasus" name="analyze" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;

&lt;!-- Section 5: Transformations - Aggregates executables and Files (can be empty) --&gt;


&lt;!-- Section 6: Job's, DAX's or Dag's - Defines a JOB or DAX or DAG (Atleast 1 required) --&gt;

   &lt;job id="j1" namespace="pegasus" name="preprocess" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a preprocess -T 60 -i  &lt;file name="f.a"/&gt; -o  &lt;file name="f.b1"/&gt;   &lt;file name="f.b2"/&gt;&lt;/argument&gt;
      &lt;uses name="f.a" link="input"&gt;
         &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
      &lt;/uses&gt;
      &lt;uses name="f.b1" link="output" transfer="true" register="true"/&gt;
      &lt;uses name="f.b2" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j2" namespace="pegasus" name="findrange" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a findrange -T 60 -i  &lt;file name="f.b1"/&gt; -o  &lt;file name="f.c1"/&gt;&lt;/argument&gt;
      &lt;uses name="f.b1" link="input"/&gt;
      &lt;uses name="f.c1" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j3" namespace="pegasus" name="findrange" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a findrange -T 60 -i  &lt;file name="f.b2"/&gt; -o  &lt;file name="f.c2"/&gt;&lt;/argument&gt;
      &lt;uses name="f.b2" link="input"/&gt;
      &lt;uses name="f.c2" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j4" namespace="pegasus" name="analyze" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a analyze -T 60 -i  &lt;file name="f.c1"/&gt;   &lt;file name="f.c2"/&gt; -o  &lt;file name="f.d"/&gt;&lt;/argument&gt;
      &lt;uses name="f.c1" link="input"/&gt;
      &lt;uses name="f.c2" link="input"/&gt;
      &lt;uses name="f.d" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;

&lt;!-- Section 7: Dependencies - Parent Child relationships (can be empty) --&gt;

   &lt;child ref="j2"&gt;
      &lt;parent ref="j1"/&gt;
   &lt;/child&gt;
   &lt;child ref="j3"&gt;
      &lt;parent ref="j1"/&gt;
   &lt;/child&gt;
   &lt;child ref="j4"&gt;
      &lt;parent ref="j2"/&gt;
      &lt;parent ref="j3"/&gt;
   &lt;/child&gt;
&lt;/adag&gt;</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_wf"></a>10.7.2. Workflow Level Metadata</h3></div></div></div>
<p>Workflow level metadata can be associated only in the DAX under
      the root element adag. Below is a snippet that illustrates this</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

&lt;!-- Section 1: Metadata attributes for the workflow (can be empty)  --&gt;

   &lt;metadata key="name"&gt;diamond&lt;/metadata&gt;
   &lt;metadata key="createdBy"&gt;Karan Vahi&lt;/metadata&gt;

...

&lt;/adag&gt;</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_task"></a>10.7.3. Task Level Metadata</h3></div></div></div>
<p>Metadata for the tasks is picked up from</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>metadata associated with the job element in the DAX</p></li>
<li class="listitem"><p>metadata associated with the corresponding transformation. The
          transformation for a task is picked up from either a matching
          executable entry in the DAX ( if exists ) or the Transformation
          Catalog.</p></li>
</ul></div>
<p>Below is a snippet that illustrates metadata for a task specified
      in the job element in the DAX</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

...
    &lt;job id="j2" namespace="pegasus" name="findrange" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a findrange -T 60 -i  &lt;file name="f.b1"/&gt; -o  &lt;file name="f.c1"/&gt;&lt;/argument&gt;
      &lt;uses name="f.b1" link="input"/&gt;
      &lt;uses name="f.c1" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;

...

&lt;/adag&gt;</pre>
<p>Below is a snippet that illustrates metadata for a task specified
      in the executable element in the DAX</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

...
    &lt;!-- Section 4: Executables - Acts as a Transformaton Catalog (can be empty) --&gt;

   &lt;executable namespace="pegasus" name="findrange" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;metadata key="size"&gt;2048&lt;/metadata&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;

...

&lt;/adag&gt;</pre>
<p>Metadata can be associated with the transformation in the
      transformation catalog. The metadata specified in the transformation
      catalog gets automatically associated with the task level metadata for
      the corresponding task ( that uses that executable). This resolution is
      similar to how profiles associated in the Transformation Catalog get
      associated with the tasks. Below is an example Transformation Catalog
      that illustrates metadata associated with the executables.</p>
<pre class="programlisting">tr pegasus::findrange:4.0 { 
    site TestCluster { 
        pfn "/usr/bin/pegasus-keg" 
        arch "x86_64" 
        os "linux" 
        type "INSTALLED" 
        profile pegasus "clusters.size" "20" 
        metadata "key" "value" 
        metadata "appmodel" "myxform.aspen" 
        metadata "version" "3.0" 
    } 
}</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_file"></a>10.7.4. File Level Metadata</h3></div></div></div>
<p>Metadata for the files is picked up from</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>metadata associated with the file element in the DAX. File
          elements are optionally used to record the locations of input files
          for the workflow in the DAX.</p></li>
<li class="listitem"><p>metadata associated with the files in the uses section of the
          job element in the DAX</p></li>
<li class="listitem"><p>metadata associated with the file in the Replica
          Catalog.</p></li>
</ul></div>
<p>Below is a snippet that illustrates metadata for a file specified
      in the file element in the DAX</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

...
    &lt;!-- Section 3: Files - Acts as a Replica Catalog (can be empty) --&gt;

   &lt;file name="f.a"&gt;
      &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
      &lt;pfn url="file:///Volumes/Work/lfs1/work/pegasus-features/PM-902/f.a" site="local"/&gt;
   &lt;/file&gt;


...

&lt;/adag&gt;</pre>
<p>Below is a snippet that illustrates metadata for a file in the
      uses section of the job element</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

...
    &lt;job id="j1" namespace="pegasus" name="preprocess" version="4.0"&gt;
      &lt;argument&gt;-a preprocess -T 60 -i  &lt;file name="f.a"/&gt; -o  &lt;file name="f.b1"/&gt;   &lt;file name="f.b2"/&gt;&lt;/argument&gt;
      &lt;uses name="f.a" link="input"&gt;
         &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
         &lt;metadata key="source"&gt;DAX&lt;/metadata&gt;
      &lt;/uses&gt;
      &lt;uses name="f.b1" link="output" transfer="true" register="true"/&gt;
      &lt;uses name="f.b2" link="output" transfer="true" register="true"/&gt;
   &lt;/job&gt;

...

&lt;/adag&gt;</pre>
<p>Below is a snippet that illustrates metadata for an input file in
      the Replica Catalog entry for the file</p>
<pre class="programlisting"># File Based Replica Catalog
f.a file://$PWD/production_200.conf site="local" source="replica_catalog"</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_auto"></a>10.7.5. Automatically Generated Metadata attributes</h3></div></div></div>
<p>Pegasus captures certain metadata attributes as output files are
      generated and associates them at the file level in the database.
      Currently, the following attributes for the output files are
      automatically captured from the kickstart record and stored in the
      workflow database.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>pfn - the physical file location</p></li>
<li class="listitem"><p>ctime - creation time</p></li>
<li class="listitem"><p>size - size of the file in bytes</p></li>
<li class="listitem"><p>user - the linux user as who the process ran that generated
          the output file.</p></li>
</ul></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>The automatic collection of the metadata attributes for output
        files is only triggered if the output file is marked to be registered
        in the replica catalog, and --output-site option to pegasus-plan is
        specified.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="metadata_trace"></a>10.7.6. Tracing Metadata for an output file</h3></div></div></div>
<p>The command line client <a class="link" href="cli-pegasus-metadata.php" title="pegasus-metadata">pegasus-metadata</a> allows a user to
      trace all the metadata associated with the file. The client will display
      metadata for the output file, the task that generated the file, the
      workflow which contains the task, and the root workflow which contains
      the task. Below is a sample illustration of it.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-metadata file --file-name f.d --trace /path/to/submit-dir</strong></span>

Workflow 493dda63-c6d0-4e62-bc36-26e5629449ad
    createdby : Test user
    name      : diamond

Task ID0000004
    size           : 2048
    time           : 60
    transformation : analyze

File f.d
    ctime        : 2016-01-20T19:02:14-08:00
    final_output : true
    size         : 582
    user         : bamboo
</pre>
</div>
</div><div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="data_cleanup.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="data_management.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="optimization.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">10.6. Data Cleanup </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 11. Optimizing Workflows for Efficiency and Scalability</td>
</tr>
</table>
</div><?php  
            pegasus_footer();
        ?>
