Installing Tidy

NOTICE: These instructions are a work in progress and may not be fully tested. They may in fact be downright wrong. Try them at your own risk.

Tidy HTML is both a standalone program and a library that cleans up HTML documents. One possible use is for web apps that allow users to post HTML documents. In particular, if you want to give your users a rich text editor like TinyMCE or FCK Editor, you'll want to use Tidy. You might not think it's necessary--so what if the article your user wrote is malformed? Here's an example that can get you into big trouble: Hahaha! Now everything on your page below my post is going to be a link!

To use Tidy from PHP, there are two solutions: installing the PHP extension and the hackish way. I'll address the hackish way first.

The Hackish Way: Use shell_exec
On my server (wilshire), Tidy is installed as a Linux program in the path, but not as an Apache extension. So, to run it from PHP, I can call it from the Linux shell using PHP's Swiss Army Knife: the shell_exec function.

From the PHP documentation: shell_exec — Execute command via shell and return the complete output as a string

Let's say you have user-supplied HTML coming from $_POST or something like that. It's stored in $bad_html. Tidy likes to operate on files, so you'll need to use file_get_contents and file_put_contents as intermediaries. The -m option tells Tidy to modify the source file, rather than just writing to stdout. (If anyone knows a way to omit these steps, please edit this page). The --show-body-only config option causes Tidy to output the contents of the body tag only. Without that option, Tidy would wrap everything in html and body tags, and that's no good if we want to display their content inside our own page.

$file = rand(0, 10000); // Give us a new random filename

file_put_contents("temp/$file", $bad_html);

shell_exec("tidy -m --show-body-only yes temp/$file");

$good_html = file_get_contents("temp/$file");

unlink("temp/$file"); // Clean up after ourselves

The Hackish Way without intermediate files: Using of pipes |
$bad_html = addslashes($bad_html); // to escape double quotes $good_html = `echo "$bad_html" | tidy --show-body-only yes`;

The Less So, But Still Quite Hackish Way: Use proc_open
This is very similar to the previous method, but doesn't need a temp file at all. Only works in (PHP 4 >= 4.3.0, PHP 5). $descriptorspec = array( 0 => array("pipe", "r"), // stdin is a pipe that the child will read from  1 => array("pipe", "w"), // stdout is a pipe that the child will write to  2 => array("pipe", "r") // stderr );

$process = proc_open('tidy -m --show-body-only yes', $descriptorspec, $pipes);

if (is_resource($process)) {

// $pipes now looks like this: // 0 => writeable handle connected to child stdin // 1 => readable handle connected to child stdout // 2 => stderr pipe

// writes the bad html to the tidy process that is reading from stdin. fwrite($pipes[0], $bad_html); fclose($pipes[0]);

// reads the good html from the tidy process that is writing to stdout. $good_html = stream_get_contents($pipes[1]); fclose($pipes[1]);

// don't care about the stderr, but you might.

// It is important that you close any pipes before calling // proc_close in order to avoid a deadlock $return_value = proc_close($process); }

// now use $good_html for whatever

Install your own PHP
If you want to use the PHP bindings, you'll have to install Tidy as a PHP extension.

Get Prep Script
Available on the wiki. It downloads an unpacks source code for PHP and various extensions.

Install libtoolize
Compiling Tidy requires libtoolize, part of the GNU Autotools. Unfortunately, it's not installed on Dreamhost servers, so you'll have to download it, install it, and add it to your $PATH.

Install Tidy
It's very hard to find the right package. The CVS repository doesn't seem to provide libtidy, it only has the standalone. To get libtidy, I went to http://tidy.sourceforge.net/src/old/ and grabbed the last one on the list

Modify & Run PHP Install Script
The PHP 5 install scripts on the wiki have to be modified. Add the --with-tidy option.