Installing Tidy

From DreamHost

Jump to: navigation, search
The instructions provided in this article or section are considered advanced.

You are expected to be knowledgeable in the UNIX shell.
Support for these instructions is not available from DreamHost tech support.


NOTICE: These instructions are a work in progress and may not be fully tested. They may in fact be downright wrong. Try them at your own risk.

Tidy HTML is both a standalone program and a library that cleans up HTML documents. One possible use is for web apps that allow users to post HTML documents. In particular, if you want to give your users a rich text editor like TinyMCE or FCK Editor, you'll want to use Tidy. You might not think it's necessary--so what if the article your user wrote is malformed? Here's an example that can get you into big trouble:


<p>Hahaha! Now everything on your page below my post is going to be a <a href="http://evil.com">link!

To use Tidy from PHP, there are two solutions: installing the PHP extension and the hackish way. I'll address the hackish way first.

Contents

The Hackish Way: Use shell_exec()

On my server (wilshire), Tidy is installed as a Linux program in the path, but not as an Apache extension. So, to run it from PHP, I can call it from the Linux shell using PHP's Swiss Army Knife: the shell_exec() function.

From the PHP documentation:

shell_exec — Execute command via shell and return the complete output as a string

Let's say you have user-supplied HTML coming from $_POST or something like that. It's stored in $bad_html. Tidy likes to operate on files, so you'll need to use file_get_contents() and file_put_contents() as intermediaries. The -m option tells Tidy to modify the source file, rather than just writing to stdout. (If anyone knows a way to omit these steps, please edit this page). The --show-body-only config option causes Tidy to output the contents of the body tag only. Without that option, Tidy would wrap everything in html and body tags, and that's no good if we want to display their content inside our own page.

$file = rand(0, 10000); // Give us a new random filename

file_put_contents("temp/$file", $bad_html);

shell_exec("tidy -m --show-body-only yes temp/$file");

$good_html = file_get_contents("temp/$file");

unlink("temp/$file"); // Clean up after ourselves

The Less So, But Still Quite Hackish Way: Use proc_open()

This is very similar to the previous method, but doesn't need a temp file at all. Only works in (PHP 4 >= 4.3.0, PHP 5).

$descriptorspec = array(
  0 => array("pipe", "r"), // stdin is a pipe that the child will read from
  1 => array("pipe", "w"), // stdout is a pipe that the child will write to
  2 => array("pipe", "r") // stderr
);

$process = proc_open('tidy -m --show-body-only yes', $descriptorspec, $pipes);

if (is_resource($process)) {

  // $pipes now looks like this:
  // 0 => writeable handle connected to child stdin
  // 1 => readable handle connected to child stdout
  // 2 => stderr pipe

  // writes the bad html to the tidy process that is reading from stdin.
  fwrite($pipes[0], $bad_html);
  fclose($pipes[0]);

  // reads the good html from the tidy process that is writing to stdout.
  $good_html = stream_get_contents($pipes[1]);
  fclose($pipes[1]);

  // don't care about the stderr, but you might.

  // It is important that you close any pipes before calling
  // proc_close in order to avoid a deadlock
  $return_value = proc_close($process);
}

// now use $good_html for whatever

Install your own PHP

If you want to use the PHP bindings, you'll have to install Tidy as a PHP extension.

Get Prep Script

Available on the wiki. It downloads an unpacks source code for PHP and various extensions.

Install libtoolize

Compiling Tidy requires libtoolize, part of the GNU Autotools. Unfortunately, it's not installed on Dreamhost servers, so you'll have to download it, install it, and add it to your $PATH.

Install Tidy

It's very hard to find the right package. The CVS repository doesn't seem to provide libtidy, it only has the standalone. To get libtidy, I went to http://tidy.sourceforge.net/src/old/ and grabbed the last one on the list

Modify & Run PHP Install Script

The PHP 5 install scripts on the wiki have to be modified. Add the --with-tidy option.

Personal tools