A lightweight PHP library for automatically generating a Table of Contents from HTML article content. The library parses your HTML, extracts headings, creates anchor links, and provides structured data for building navigation.
- Automatic heading extraction - Parses
<h2>tags and generates URL-friendly anchor IDs - Title and perex detection - Automatically extracts the main title (
<h1>) and introductory paragraph - XSS-safe output - All generated attributes are properly escaped to prevent security vulnerabilities
- Immutable response object - Returns a clean, typed
Responseentity with all extracted data - Zero configuration - Works out of the box with sensible defaults
- PHP 8.0+ support - Uses modern PHP features including named arguments and constructor property promotion
The library consists of two main components working together:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HTML Input β
β <h1>Title</h1><p>Perex...</p><h2>Section 1</h2>... β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ContentManager β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Parses <h2> headings β β
β β β’ Generates webalized anchor IDs (slug format) β β
β β β’ Injects <div> anchors before each heading β β
β β β’ Extracts <h1> title β β
β β β’ Extracts first <p> as perex β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Response β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ original: string (unchanged input HTML) β β
β β β’ content: string (HTML with injected anchors) β β
β β β’ pureContent: string (content without <h1>) β β
β β β’ title: ?string (extracted from <h1>) β β
β β β’ perex: ?string (extracted from first <p>) β β
β β β’ items: array (id => title mapping) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The main service class responsible for parsing HTML content. It provides a single public method:
parse(string $html): Response- Accepts raw HTML and returns a structuredResponseobject
Processing steps:
- Scans for all
<h2>tags in the content - For each heading, generates a URL-friendly ID using
Nette\Utils\Strings::webalize() - Injects an anchor
<div>element before each heading for smooth scroll navigation - Extracts the page title from the first
<h1>tag - Extracts the perex (lead paragraph) from the first
<p>tag - Returns all data wrapped in an immutable
Responseobject
An immutable data transfer object implementing Stringable. When cast to string, it returns the processed content with anchors.
Available methods:
| Method | Return Type | Description |
|---|---|---|
getOriginal() |
string |
Returns the original unmodified HTML input |
getContent() |
string |
Returns HTML with injected anchor elements |
getPureContent() |
string |
Returns content without the <h1> title tag |
getTitle() |
?string |
Returns the extracted title or null |
getPerex() |
?string |
Returns the extracted perex or null |
getItems() |
array<string, string> |
Returns anchor ID to heading title mapping |
It's best to use Composer for installation, and you can also find the package on Packagist and GitHub.
To install, simply use the command:
$ composer require baraja-core/table-of-contentYou can use the package manually by creating an instance of the internal classes, or register a DIC extension to link the services directly to the Nette Framework.
- PHP 8.0 or higher
nette/utils^3.0
use Baraja\TableOfContent\ContentManager;
$manager = new ContentManager();
$html = '
<h1>PHP Online Course for Beginners</h1>
<p>PHP is a server-side scripting language designed for modern web applications.</p>
<h2>How to Start?</h2>
<p>First, you need to install PHP on your computer...</p>
<h2>Basic Software</h2>
<p>You will need a code editor and a local server...</p>
<h2>License</h2>
<p>This course is released under MIT license.</p>
';
$response = $manager->parse($html);// Get the title extracted from <h1>
$title = $response->getTitle();
// Result: "PHP Online Course for Beginners"
// Get the perex extracted from the first <p>
$perex = $response->getPerex();
// Result: "PHP is a server-side scripting language designed for modern web applications."
// Get all table of content items (ID => Title)
$items = $response->getItems();
// Result:
// [
// 'how-to-start' => 'How to Start?',
// 'basic-software' => 'Basic Software',
// 'licence' => 'License',
// ]
// Get modified content with anchor elements
$content = $response->getContent();
// Get content without the <h1> tag (useful for separate title rendering)
$pureContent = $response->getPureContent();
// Get the original unmodified HTML
$original = $response->getOriginal();$items = $response->getItems();
echo '<nav class="table-of-contents">';
echo '<h3>Contents:</h3>';
echo '<ol>';
foreach ($items as $id => $title) {
echo sprintf('<li><a href="#%s">%s</a></li>', $id, htmlspecialchars($title));
}
echo '</ol>';
echo '</nav>';The Response object implements Stringable, so you can use it directly where a string is expected:
$response = $manager->parse($html);
// Both of these are equivalent:
echo $response;
echo $response->getContent();The following image shows the structure of the Response object after parsing:
Example of how a rendered table of contents looks in a real application:
When the parser encounters an <h2> heading like:
<h2>How to Start?</h2>It transforms it to:
<div id="how-to-start" class="content-anchor"></div><h2>How to Start?</h2>The anchor ID is generated using Nette\Utils\Strings::webalize() which:
- Converts text to lowercase
- Replaces spaces with hyphens
- Removes diacritics (accents)
- Strips special characters
This ensures clean, URL-friendly anchor IDs that work reliably across all browsers.
The library implements proper XSS protection:
- All generated
idattributes are escaped usinghtmlspecialchars()withENT_QUOTES | ENT_HTML5 | ENT_SUBSTITUTEflags - Protection against innerHTML mXSS vulnerability (nette/nette#1496) is included
- Original content is preserved without modification in
getOriginal()
For Nette Framework users, you can register the service in your configuration:
services:
- Baraja\TableOfContent\ContentManagerThen inject it into your presenters or services:
public function __construct(
private ContentManager $contentManager,
) {
}For smooth scroll behavior to anchors, add this CSS:
html {
scroll-behavior: smooth;
}
.content-anchor {
scroll-margin-top: 80px; /* Offset for fixed headers */
}Jan Barasek
- Website: https://baraja.cz
- GitHub: @baraja-core
baraja-core/table-of-content is licensed under the MIT license. See the LICENSE file for more details.

