# @package      hubzero-textifier
# @file         INSTALL
# @author       Steve Snyder <snyder13@purdue.edu>
# @copyright    Copyright (c) 2012 HUBzero Foundation, LLC.
# @license      http://www.gnu.org/licenses/lgpl-3.0.html LGPLv3
#
# Copyright (c) 2012 HUBzero Foundation, LLC.
#
# This file is part of: The HUBzero(R) Platform for Scientific Collaboration
#
# The HUBzero(R) Platform for Scientific Collaboration (HUBzero) is free
# software: you can redistribute it and/or modify it under the terms of
# the GNU Lesser General Public License as published by the Free Software
# Foundation, either version 3 of the License, or (at your option) any
# later version.
#
# HUBzero is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# HUBzero is a registered trademark of HUBzero Foundation, LLC.

Dependencies
============
aptitude install libmysql++-dev libcrypto++-dev libpoppler-dev libboost-regex-dev

Build
=====
sudo make install

Configure
=========
Edit /etc/textifier.conf to use some credentials with write access to any tables named in the SQL statement.

The SQL statement, by default:

INSERT INTO jos_document_text_content(hash, text_content) VALUES (%0q, %1q) ON DUPLICATE KEY UPDATE text_content = %1q

has paramters 0 => SHA1 hash of the document, and 1 => best guess as its text content

Use
===
Called automatically where appropriate by the Hubzero CMS when the textifier binary is on the path.

To index old documents or to otherwise use out-of-band with the CMS, invoke the textifier binary with a single argument giving a filename to textify. The program will print the file's SHA1 sum and fork to parse, which may take a few seconds depending on the file type.
