
DOM to Semantic Markdown is a JavaScript library that transforms HTML strings or elements into a semantic markdown format optimized for large language models (LLMs). It works in both browser and Node.js environments.
DOM to Semantic Markdown tackles key challenges in web content extraction for LLMs. Unlike basic HTML-to-text conversions, it preserves semantic structure. The library produces a concise yet information-rich format, reducing token usage compared to raw HTML. It retains crucial metadata like links, image descriptions, and structural information. This leads to improved processing and reasoning capabilities when working with web content.
How to use it:
1. Install the library using npm:
npm install dom-to-semantic-markdown
2. Import the necessary functions into your project:
// Browser
import { convertHtmlToMarkdown, convertElementToMarkdown } from 'dom-to-semantic-markdown';// Node.js
const { convertHtmlToMarkdown, convertElementToMarkdown } = require('dom-to-semantic-markdown');3. Now, you can easily convert HTML strings or elements into Semantic Markdown:
// HTML String Conversion (Browser)
const html = 'Your HTML Content Here';
const markdown = convertHtmlToMarkdown(html, {
// Optional configuration
});
console.log(markdown);// HTML String Conversion (Node.js - requires jsdom)
const jsdom = require('jsdom');
const { JSDOM } = jsdom;
const html = 'Your HTML Content Here';
const dom = new JSDOM(html);
const markdown = convertHtmlToMarkdown(html, { overrideDOMParser: dom.window.DOMParser });
console.log(markdown);// HTML Element Conversion
const element = document.querySelector('#my-element');
const markdown = convertHtmlToMarkdown(element, {
// Optional configuration
});
console.log(markdown);4. Customize the conversion process with following configs:
const markdown = convertHtmlToMarkdown(html, {
// Specify the website's domain for context.
websiteDomain: "",
// Focus on extracting the primary content.
extractMainContent: true,
// Convert URLs to a reference-style format.
refifyUrls: true,
// Enable logging for debugging purposes.
debug: false,
// Provide a custom DOMParser for Node.js.
overrideDOMParser: DOMParser,
// Adds unique identifiers to table columns
enableTableColumnTracking: true,
});Changelog:
v1.0.11 (07/24/2024)
- add support for tracking table columns







