Automatically generate a Table of Contents using PHP

by Alex Freeman

A few weeks ago, I was rearranging the Table of Contents for one of the pages here on 10stripe when it hit me. Why not automate this tedious, dull task?

Motivation

Automation is certainly appealing. It reduces the workload for me, and ensures the table of contents consistently matches the live page. This has been a problem before; one innocent little edit can easily cause the table of contents to no longer agree with the page it is supposed to guide the reader through.

It should be no surprise, then, that many content management systems can already do this. Indeed, Google searches on the topic repeatedly landed on Wordpress examples. Not especially helpful for those of us that don't run Wordpress. What I needed was something I could just drop into a page and be done.

This script

The script I developed is a bit of PHP that doesn't require much in the way of outside help. It requires a web server that supports PHP, of course, but that's about it.

If you want a live example of this script in action, look up a few lines. It is being used to generate the table of contents on this page.

How it works

There are a few key pieces to implementing this script. You will need the script itself, you will need to call the function it contains (you can lump these together if you prefer to just put the entire block of PHP inline, it's your choice), and you will need some CSS to make it pretty (unless of course you prefer it plain).

The script opens the current file on the server, reads its contents, and uses a regular expression to suss out the headings. It then fiddles with them a bit and plunks them into some HTML.

The script

The whole shebang:

<?php
	function TableOfContents($depth)
	/*AutoTOC function written by Alex Freeman
	* Released under CC-by-sa 3.0 license
	* http://www.10stripe.com/  */
	{
	$filename = __FILE__;
	//read in the file
	$file = fopen($filename,"r");
	$html_string = fread($file, filesize($filename));
	fclose($file);
 
	//get the headings down to the specified depth
	$pattern = '/<h[2-'.$depth.']*[^>]*>.*?<\/h[2-'.$depth.']>/';
	$whocares = preg_match_all($pattern,$html_string,$winners);
 
	//reformat the results to be more usable
	$heads = implode("\n",$winners[0]);
	$heads = str_replace('<a name="','<a href="#',$heads);
	$heads = str_replace('</a>','',$heads);
	$heads = preg_replace('/<h([1-'.$depth.'])>/','<li class="toc$1">',$heads);
	$heads = preg_replace('/<\/h[1-'.$depth.']>/','</a></li>',$heads);
 
	//plug the results into appropriate HTML tags
	$contents = '<div id="toc"> 
	<p id="toc-header">Contents</p>
	<ul>
	'.$heads.'
	</ul>
	</div>';
	echo $contents;
	}
 ?>

I chose to stow this script in the HEAD of the document. Note that it will only be executed when it is actually called, so including it in all files (even if a table of contents is actually only generated for some pages, as on 10stripe) is not especially problematic. Note also that you could point the script toward any arbitrary file by replacing $_SERVER['PHP_SELF'] with an appropriate string. The table can be made to include H1s as well, by replacing both occurrences of the number 2 in $pattern with the number 1.

The HTML wrapped around the results at the end can be altered to taste.

The function call

To actually generate and insert a table of contents, you will need to call the function in the body of the HTML. This is easy enough:

<?php TableOfContents(3); ?>

The number (which should be a positive integer) is used to control how deep the script will crawl. For the value 3, the script will include H2 and H3 elements, but exclude H4, H5, and H6.

The HTML

For the script to pick up on headings, the page has to have them. Good, semantic heading structure is recommended.

Those headings also need to have anchors in them if you want the table of contents to link to them. This is a great convenience feature, because it lets users jump directly to the section they want to read. The script assumes this usage:

<h2><a name="target"></a>Text</h2>

But will also accept:

<h2><a name="target">Text</a></h2>

And many other permutations. This was done to accomodate some legacy pages at 10stripe.

The CSS

To make the table of contents match the rest of your page, you will want to style it with some CSS. To get you started, the CSS currently used on this page is below. It is based on Wikipedia's common CSS stylesheet.

#toc
{
	border: 1px solid #bba;
	background-color: #f7f8ff;
	padding: 1em;
	font-size: 90%;
	text-align: center;
	width:15em;
}

#toc-header
{
	display: inline;
	padding: 0;
	font-size: 100%;
	font-weight: bold;
}

#toc ul
{
	list-style-type: none;
	margin-left: 0;
	padding-left: 0;
	text-align: left;
}

.toc3
{
	margin-left: 1em;
}

.toc4
{
	margin-left: 2em;
}
	

Fin

That's all there is to it. As noted above, this script is available under the CC-by-sa 3.0 license. Please feel free to share it with others, but don't forget who your friends are.

About the Author

Photo of the author

Alex Freeman is the creator of 10stripe.com, and has entirely too much free time.

He is not proud of the amount of time spent acclimating himself to regular expressions in order to pull off this little trick. Hopefully you will find it useful.

  • RSS feed StumbleUpon del.icio.us Digg Yahoo! My Web 2.0