<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="creating_workflows.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="site.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="replica"></a>4.2. Data Discovery (Replica Catalog)</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="replica.php#rc-FILE">4.2.1. File</a></span></dt>
<dt><span class="section"><a href="replica.php#rc-regex">4.2.2. Regex</a></span></dt>
<dt><span class="section"><a href="replica.php#rc-directory">4.2.3. Directory</a></span></dt>
<dt><span class="section"><a href="replica.php#rc-JDBCRC">4.2.4. JDBCRC</a></span></dt>
<dt><span class="section"><a href="replica.php#rc-MRC">4.2.5. MRC</a></span></dt>
</dl></div>
<p>The Replica Catalog keeps mappings of logical file ids/names (LFN's)
    to physical file ids/names (PFN's). A single LFN can map to several PFN's.
    A PFN consists of a URL with protocol, host and port information and a
    path to a file. Along with the PFN one can also store additional key/value
    attributes to be associated with a PFN.</p>
<p>Pegasus supports the following implementations of the Replica
    Catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>File</strong></span>(Default)</p></li>
<li class="listitem"><p><span class="bold"><strong>Regex</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>Directory</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>Database via JDBC</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>MRC</strong></span></p></li>
</ol></div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-FILE"></a>4.2.1. File</h3></div></div></div>
<p>In this mode, Pegasus queries a file based replica catalog. The
      file format is a simple multicolumn format. It is neither
      transactionally safe, nor advised to use for production purposes in any
      way. Multiple concurrent instances will conflict with each other. The
      site attribute should be specified whenever possible. The attribute key
      for the site attribute is <span class="bold"><strong>"pool".</strong></span></p>
<pre class="programlisting">
LFN PFN
LFN PFN a=b [..]
LFN PFN a="b" [..]
"LFN w/LWS" "PFN w/LWS" [..]
      </pre>
<p>The LFN may or may not be quoted. If it contains linear
      whitespace, quotes, backslash or an equal sign, it must be quoted and
      escaped. The same conditions apply for the PFN. The attribute key-value
      pairs are separated by an equality sign without any whitespaces. The
      value may be quoted. The LFN sentiments about quoting apply.</p>
<p>The file mode is the Default mode. In order to use the File mode
      you have to set the following properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=File</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.file=<em class="replaceable"><code>&lt;path to
            the replica catalog file&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-regex"></a>4.2.2. Regex</h3></div></div></div>
<p>In this mode, Pegasus queries a file based replica catalog. The
      file format is a simple multicolumn format. It is neither
      transactionally safe purposes in any way. Multiple concurrent instances
      will conflict with each other. The site attribute should be specified
      whenever possible. The attribute key for the site attribute is <span class="bold"><strong>"pool".</strong></span></p>
<p>In addition users can specifiy regular expression based LFN's. A
      regular expression based entry should be qualified with an attribute
      named 'regex'. The attribute regex when set to true identifies the
      catalog entry as a regular expression based entry. Regular expressions
      should follow Java regular expression syntax.</p>
<p>For example, consider a replica catalog as shown below.</p>
<p>Entry 1 refers to an entry which does not use a regular
      expressions. This entry would only match a file named 'f.a', and nothing
      else.</p>
<p>Entry 2 referes to an entry which uses a regular expression. In
      this entry f.a referes to files having name as f&lt;any-character&gt;a
      i.e. faa, f.a, f0a, etc.</p>
<pre class="programlisting">#1
f.a file:///Volumes/data/input/f.a site="local"
#2
f.a file:///Volumes/data/input/f.a site="local" <span class="bold"><strong>regex</strong></span>="true"
</pre>
<p>Regular expression based entries also support substitutions. For
      example, consider the regular expression based entry shown below.</p>
<p>Entry 3 will match files with name alpha.csv, alpha.txt,
      alpha.xml. In addition, values matched in the expression can be used to
      generate a PFN.</p>
<p>For the entry below if the file being looked up is alpha.csv, the
      PFN for the file would be generated as
      file:///Volumes/data/input/csv/alpha.csv. Similary if the file being
      lookedup was alpha.csv, the PFN for the file would be generated as
      file:///Volumes/data/input/xml/alpha.xml i.e. The section [0], [1] will
      be replaced. Section [0] refers to the entire string i.e. alpha.csv.
      Section [1] refers to a partial match in the input i.e. csv, or txt, or
      xml. Users can utilize as many sections as they wish.</p>
<pre class="programlisting">#3
alpha\.(csv|txt|xml) file:///Volumes/data/input/<span class="bold"><strong>[1]</strong></span>/<span class="bold"><strong>[0]</strong></span> site="local" <span class="bold"><strong>regex</strong></span>="true"</pre>
<p>In case of a LFN name matching multiple entries in the file, the
      implementation picks up the first matching regex as it appears in the
      file. If you want to specify a default location for all LFN's that don't
      match any regex expression, you can have this entry as the last entry in
      your file.</p>
<pre class="programlisting">#4 all unmatched LFN's reside in the same input directory.

.*     file:///Volumes/data/input/<span class="bold"><strong>[0]</strong></span> site="local" <span class="bold"><strong>regex</strong></span>="true"</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-directory"></a>4.2.3. Directory</h3></div></div></div>
<p>In this mode, Pegasus does a directory listing on an input
      directory to create the LFN to PFN mappings. The directory listing is
      performed recursively, resulting in deep LFN mappings. For example, if
      an input directory $input is specified with the following
      structure</p>
<pre class="programlisting">$input
$input/f.1
$input/f.2
$input/D1
$input/D1/f.3</pre>
<p>Pegasus will create the mappings the following LFN PFN mappings
      internally</p>
<pre class="programlisting">f.1 file://$input/f.1  site="local"
f.2 file://$input/f.2  site="local"
D1/f.3 file://$input/D1/f.3 site="local"</pre>
<p>Users can optionally specify additional properties to configure
      the behavior of this implementation.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.site</strong></span> to
          specify a site attribute other than local to associate with the
          mappings.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.flat.lfn</strong></span> to
          specify whether you want deep LFN's to be constructed or not. If not
          specified, value defaults to false i.e. deep lfn's are constructed
          for the mappings.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.url.prefix</strong></span>
          to associate a URL prefix for the PFN's constructed. If not
          specified, the URL defaults to file://</p></li>
</ol></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>pegasus-plan has -<span class="bold"><strong>-input-dir</strong></span>
        option that can be used to specify an input directory on the command
        line. This allows you to specify a separate replica catalog to catalog
        the locations of output files.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-JDBCRC"></a>4.2.4. JDBCRC</h3></div></div></div>
<p>In this mode, Pegasus queries a SQL based replica catalog that is
      accessed via JDBC. To create the schema for JDBCRC use the <a class="link" href="cli-pegasus-db-admin.php" title="pegasus-db-admin">pegasus-db-admin</a> command line
      tool.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>A site attribute was added to the SQL schema as a unique key for
        4.4. To update an existing database schema, use pegasus-db-admin
        tool.</p>
<div class="figure">
<a name="idp57407424"></a><p class="title"><b>Figure 4.2. Schema Image of the JDBCRC.</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/jdbcrc-schema.png" alt="Schema Image of the JDBCRC."></div></div>
</div>
<br class="figure-break">
</div>
<p>To use JDBCRC, the user additionally needs to set the following
      properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica JDBCRC
          </strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.driver mysql
          | sqlite </strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.url=<em class="replaceable"><code>&lt;jdbc url
          to the database&gt; e.g
          jdbc:mysql://database-host.isi.edu/database-name |
          jdbc:sqlite:/shared/jdbcrc.db </code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.user=<em class="replaceable"><code>&lt;database
          user&gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.password=<em class="replaceable"><code>&lt;database
          password&gt;</code></em></strong></span></p></li>
</ol></div>
<p>Users can use the command line client
      <span class="emphasis"><em>pegasus-rc-client</em></span> to interface to query, insert and
      remove entries from the JDBCRC backend. Starting 4.5 release, there is
      also support for sqlite databases. Specify the jdbc url to refer to a
      sqlite database .</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-MRC"></a>4.2.5. MRC</h3></div></div></div>
<p>In this mode, Pegasus queries multiple replica catalogs to
      discover the file locations on the grid.</p>
<p>To use it set</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=<em class="replaceable"><code>MRC</code></em></strong></span></p></li></ol></div>
<p>Each associated replica catalog can be configured via properties
      as follows.</p>
<p>The user associates a variable name referred to as [value] for
      each of the catalogs, where [value] is any legal identifier (concretely
      [A-Za-z][_A-Za-z0-9]*) For each associated replica catalogs the user
      specifies the following properties</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.[value]
          </strong></span>- specifies the type of replica catalog.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.[value].key
          </strong></span>- specifies a property name key for a particular
          catalog</p></li>
</ul></div>
<p>For example, to query a File catalog and JDBCRC at the same time
      specify the following:</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.jdbcrc=JDBCRC</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.jdbcrc.url=<em class="replaceable"><code>&lt;jdbc
            url &gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.file1=File</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.file1.url=</strong></span><span class="bold"><strong>&lt;path to file based replica
            catalog&gt;</strong></span></p></li>
</ul></div>
<p>In the above example,<span class="bold"><strong> jdbcrc</strong></span> and
      <span class="bold"><strong> file1</strong></span> are any valid identifier names
      and <span class="bold"><strong>url</strong></span> is the property key that needed
      to be specified.</p>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="pegasus-rc-client"></a>4.2.5.1. Replica Catalog Client pegasus-rc-client</h4></div></div></div>
<p>The client used to interact with the Replica Catalogs is
        pegasus-rc-client. The implementation that the client talks to is
        configured using Pegasus properties.</p>
<p>Lets assume we create a file f.a in your home directory as shown
        below.</p>
<pre class="screen"><span class="command"><strong>$ date &gt; $HOME/f.a </strong></span></pre>
<p>We now need to register this file in the <span class="bold"><strong>File</strong></span> replica catalog located in <span class="bold"><strong>$HOME/rc</strong></span> using the pegasus-rc-client. Replace
        the <span class="bold"><strong>gsiftp://url</strong></span> with the appropriate
        parameters for your grid site.</p>
<pre class="screen"><span class="emphasis"><em>$<span class="command"><strong> pegasus-rc-client -Dpegasus.catalog.replica=File -Dpegasus.catalog.replica.file=$HOME/rc insert \
 f.a</strong></span> <em class="replaceable"><code>gsiftp://somehost:port/path/to/file/f.a site=local</code></em></em></span></pre>
<p>You may first want to verify that the file registeration is in
        the replica catalog. Since we are using a File catalog we can look at
        the file <span class="bold"><strong>$HOME/rc</strong></span> to view
        entries.</p>
<pre class="screen"><span class="command"><strong>$ cat $HOME/rc</strong></span><code class="computeroutput">
    
# file-based replica catalog: 2010-11-10T17:52:53.405-07:00
f.a gsiftp://somehost:port/path/to/file/f.a site=local</code></pre>
<p>The above line shows that entry for file <span class="bold"><strong>f.a</strong></span> was made correctly.</p>
<p>You can also use the <span class="bold"><strong>pegasus-rc-client</strong></span> to look for entries.</p>
<pre class="screen"><span class="command"><strong>$ pegasus-rc-client -Dpegasus.catalog.replica=File -Dpegasus.catalog.replica.file=$HOME/rc lookup LFN f.a</strong></span><code class="computeroutput">

f.a gsiftp://somehost:port/path/to/file/f.a site=local</code></pre>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="creating_workflows.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="creating_workflows.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="site.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 4. Creating Workflows </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 4.3. Resource Discovery (Site Catalog)</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
