Convert your Telegram chat exports to optimized PDF files ready for AI processing and vector databases.
-
Clone the repository:
git clone [email protected]:rldyourmnd/telegram_to_pdfVectorDB.git cd telegram_to_pdfVectorDB
-
Export your Telegram chats:
- Open Telegram Desktop
- Go to Settings → Advanced → Export Telegram data
- Select "Personal chats" and "Machine-readable JSON"
- Save the export as
result.jsonin the project folder
-
Run the processor:
- Windows: Double-click
launch_windows.bat - Linux: Run
./launch_linux.sh - First-time setup: Use
install_and_run_windows.bat(installs Python automatically) - The script will automatically install dependencies and process your chats
- Windows: Double-click
project/
├── chats_clean_pdf/ # Generated PDF files
├── metadata/ # Processing metadata
│ └── metadata_summary.json
├── result.json # Your Telegram export
└── launch_windows.bat # Easy launcher
The tool works out-of-the-box, but you can customize settings by creating a .env file:
# Copy the example configuration
cp .env.example .env
# Edit with your settings
notepad .env # Windows
nano .env # Linux/macOSKey settings to configure:
# IMPORTANT: Replace with your actual Telegram data
USER_NAME=Your Actual Telegram Name
USER_ID=user123456789
# PDF optimization (default values work well)
MAX_FILE_SIZE_KB=200
PDF_FONT_SIZE=10
PDF_LINE_SPACING=12
# Chunking algorithm (recommended defaults)
CHUNK_SIZE_SHORT=25
CHUNK_SIZE_MEDIUM=18
CHUNK_SIZE_LONG=12
# Processing options
MIN_MESSAGE_LENGTH=2
SHORT_MESSAGE_THRESHOLD=50
LONG_MESSAGE_THRESHOLD=150
# Debug options
VERBOSE_LOGGING=true
SHOW_PROGRESS=trueTo correctly identify your messages vs received messages:
- Open
result.jsonin any text editor - Search for a message you wrote (recognize by your writing style)
- Find these fields in your message:
"from": "Your Actual Name", "from_id": "user123456789"
- Copy exact values to your
.envfile - Test: If messages still show as "From [Name]:" instead of "Me:", check your settings
If you prefer manual setup:
# Install Python dependencies
pip install -r requirements.txt
# Run the processor
python process_telegram_chats.py- Optimized for AI: PDFs sized for vector databases (max 200KB by default)
- Smart chunking: Dynamic chunk sizing based on message length
- Memory efficient: Processes large chats in parts
- Clean formatting: Optimized text format for AI processing
- Metadata tracking: Complete processing information
- Cross-platform: Works on Windows, macOS, and Linux
Perfect for n8n workflows:
-
Text Splitter settings:
- Chunk size: 800
- Overlap: 200
-
Batch processing: 5-8 files at a time for optimal memory usage
-
Search patterns:
Me:for your messagesFrom [NAME]:for contact messages
-
Embedding models:
- OpenAI:
text-embedding-ada-002(1536 dimensions) - Local: Any 768-dimension model
- OpenAI:
The processor provides detailed statistics:
- Total chats processed
- Messages per chat
- PDF files created
- Chunk distribution
- Large chats split into multiple parts
- Python 3.7+
- Windows/macOS/Linux
- Telegram Desktop (for export)
Solution: Configure your user identification in .env file:
- Copy
.env.exampleto.env - Open
result.jsonand find a message you sent - Look for
"from": "Your Name"and"from_id": "user123456789" - Update
.envwith these exact values:USER_NAME=Your Exact Telegram Name USER_ID=user123456789
Solutions:
- Windows: Use
install_and_run_windows.bat(auto-installs Python) - Manual: Download Python from python.org and check "Add to PATH"
- Linux:
sudo apt install python3 python3-pip
Solution: Export your Telegram data correctly:
- Telegram Desktop → Settings → Advanced → Export Telegram data
- Select "Personal chats" and "Machine-readable JSON"
- Save as
result.jsonin the project folder
The .gitignore protects against committing:
result.json(your personal chat data)- Generated PDFs (
chats_clean_pdf/) - Configuration files (
.env)
- Check if
result.jsonis valid JSON (not corrupted) - Try with
VERBOSE_LOGGING=truein.env - Ensure enough disk space (chat exports can be large)
- For very large exports, process in smaller batches
- Your data stays local - no data is sent anywhere
- Git protection -
.gitignoreprevents accidental data commits - Configuration files - Never commit
.envfiles with personal data - Generated PDFs - Review before sharing (contain your chat history)
MIT License - feel free to use and modify!