Universal Data Layer for AI Systems
Transform documentation, GitHub repos, PDFs, and codebases into structured knowledge for any AI system. 16 output formats. One tool for RAG pipelines, AI coding assistants, and Claude skills.
$ pip install skill-seekers
Any Source โ Any AI System
One tool unifies all your data preprocessing needs
Input Sources
Documentation
Any doc site, Docusaurus, GitBook, ReadTheDocs
GitHub Repos
Public & private repos, with AST analysis
PDF Files
Scanned docs, manuals, research papers with OCR
Local Codebases
27+ languages, game engines, custom projects
16 Output Formats
RAG / Vectors
AI Platforms
AI Coding
Generic
Universal Preprocessing for AI
Skill Seekers is the universal data layer for AI systems. It transforms documentation websites, GitHub repositories, PDF files, and local codebases into production-ready formats for RAG pipelines, AI coding assistants, Claude skills, and any LLM platform.
The Problem
- ร 70% of RAG development time is spent on data preprocessingโscraping, cleaning, chunking
- ร AI coding assistants don't know your frameworks without manual context injection
- ร Multi-source knowledge (docs + code + PDFs) requires complex integration
- ร Different AI systems need different formatsโLangChain, LlamaIndex, Cursor, Claude
The Solution
- โ One tool for all sources: docs, GitHub repos, PDFs, and local codebases
- โ Smart chunking preserves context, code blocks, and hierarchical structure
- โ 16 output formats: RAG pipelines, AI coding assistants, Claude skills, vector DBs
- โ 15-45 minutes end-to-end: from source to production-ready AI knowledge
10x Faster Development
Stop copying docs manually. Generate comprehensive skills in minutes, not hours.
Framework Expertise
Give AI assistants deep knowledge of any framework with API references and examples.
Always Up-to-Date
Re-run when docs update. Keep your AI knowledge fresh and accurate.
Universal Intelligence Platform
Transform any documentation into structured knowledge for any AI system
Three-Stream Analysis
Split GitHub repos into Code (C3.x), Docs, and Insights streams for comprehensive skills
Multi-Source Scraping
Extract from documentation websites, GitHub repositories, and PDF files
AI Enhancement
Automatically add explanations, examples, and best practices using Claude
16 LLM Platforms
Deploy to LangChain, LlamaIndex, Chroma, Claude, Gemini, OpenAI, Cursor, and more
24 Preset Configs
Ready-to-use configs for popular frameworks (React, Vue, Django, etc.)
26 MCP Tools
AI agents can prepare their own knowledge with 26 MCP tools (v3.0.0)
1852 Tests
Production-ready with 1852 tests across 100 test files (v3.0.0)
Zero Manual Work
Fully automated pipeline from source to production-ready skill
20-40 Minutes
Complete skill generation in under an hour, including AI enhancement
+16 more features
Ready to transform your documentation?
Get Started NowGet Started in 3 Steps
From zero to production-ready skill in 20-40 minutes
1. Install
Install from PyPI in seconds
pip install skill-seekers 2. Scrape Docs
Use preset configs or create your own
skill-seekers scrape --config react
# Or from URL directly
skill-seekers scrape --url https://react.dev --name react 3. Package & Upload
Create .zip and upload to Claude
skill-seekers package output/react/
skill-seekers upload react.zip
# Done! Your skill is ready to use Multiple Installation Options
PyPI (Recommended)
Easiestpip install skill-seekers uv (Modern)
Fastuv tool install skill-seekers From Source
Devgit clone && pip install -e . MCP Integration
5 Agents./setup_mcp.sh Who Uses Skill Seekers?
From solo developers to enterprise teams
For Developers
Create skills from documentation + GitHub repos with automatic conflict detection.
"Build a React skill from official docs + GitHub repo, catch API changes before they surprise you."
For Game Developers
Generate comprehensive skills for game engines like Godot (handles 40K+ pages!).
"Create complete Godot skill covering all topics with intelligent router/hub pattern."
For Teams
Combine internal docs + code repositories into single source of truth.
"Share custom configs via private git repos across 3-500+ team members."
For Learners
Build comprehensive skills from docs, code examples, and PDF tutorials.
"Combine official docs + GitHub examples + PDF manual into one unified learning resource."
For Open Source
Analyze repos to find documentation gaps and outdated examples automatically.
"Detect discrepancies between documentation and actual code implementation."
Multi-Platform Support
Export your skills to multiple LLM platforms with platform-specific optimizations
By the Numbers
Trusted metrics from a production-ready tool
Open source โข MIT Licensed โข Active development