Copy Web Content as Markdown for LLMs – CopyLlmsTxt.js

Category: Javascript | April 9, 2025
Authortsriram
Last UpdateApril 9, 2025
LicenseMIT
Views45 views
Copy Web Content as Markdown for LLMs – CopyLlmsTxt.js

CopyLlmsTxt is a lightweight JavaScript library that helps you get clean markdown text from a webpage.

It lets you hit a keyboard shortcut (like Cmd+C or Ctrl+C by default) and copy the main content of the current page to your clipboard, formatted as markdown, minus the usual cruft like navigation, ads, footers, etc.

Think about feeding articles, documentation, or blog posts into Large Language Models (LLMs) for summarization, analysis, or Q&A. You usually want just the core content, not the entire site structure. Manually cleaning up copied HTML is tedious. This library aims to automate that.

It hooks into keyboard events, clones the current page’s <body>, attempts to intelligently remove non-content elements, converts the remaining HTML to markdown using the solid Turndown library, and puts the result on your clipboard.

How to use it:

1. Install and import CopyLlmsTxt.js.

# NPM
$ npm install copy-llms-txt
import CopyLlmsTxt from "copy-llms-txt";

2. Initialize CopyLlmsTxt.js with default options. Now you can use Cmd+C (Mac) or Ctrl+C (Windows/Linux) to copy page content as markdown.

CopyLlmsTxt.init();

3. Change the default shortcut.

CopyLlmsTxt.init({
  // Cmd/Ctrl + T
  key: 't', 
  // Force Cmd/Meta key even on Windows/Linux
  meta: true,
})

4. Specify elements to ignore.

CopyLlmsTxt.init({
  selectorsToRemove: [
    "nav",
    "header",
    "footer",
    "aside",
    ".navigation",
    ".nav",
    ".menu",
    ".sidebar",
    ".ad",
    ".ads",
    ".advertisement",
    "script",
    "style",
    "noscript",
    "iframe"
  ]
})

5. Pass turndown options. Check the Turndown documentation for the complete list of options.

CopyLlmsTxt.init({
  turndownOptions: {
    // Options here
  },
})

6. Manually trigger the copy action:

CopyLlmsTxt.copy();

7. Destroy the instance.

CopyLlmsTxt.destroy();

FAQs:

Q: Does CopyLlmsTxt work perfectly on every website?

A: No. Its effectiveness depends heavily on the website’s structure and how cleanly its content is separated from navigation, ads, etc. The default selectors cover common patterns, but complex or poorly structured sites will yield less clean results. You’ll likely need to use selectorsToRemove for specific sites. It also only captures the DOM state at the moment the shortcut is pressed, so content loaded asynchronously after that point won’t be included unless it’s already rendered.

Q: Can I use this in a Tampermonkey/Greasemonkey userscript?

A: Yes, this should work fine in a userscript. You’d need to include the library’s code (or use @require if hosted on a CDN that allows it) and then call CopyLlmsTxt.init() within your script.

Q: Does it handle content inside iframes?

A: No. It operates on the document.body of the current window context where the script is initialized. Content within iframes belongs to a different document and won’t be included or cleaned. The default selectors also explicitly remove iframe elements.

Q: Will this copy images too?

A: The library converts images to markdown image syntax (![alt text](image url)), but it doesn’t download or embed the actual image files. The URLs to images are preserved in the markdown.

You Might Be Interested In:


Leave a Reply