Telegram Chat PDF Processor

Convert your Telegram chat exports to optimized PDF files ready for AI processing and vector databases.

🚀 Quick Start (Windows)

Clone the repository:

git clone [email protected]:rldyourmnd/telegram_to_pdfVectorDB.git
cd telegram_to_pdfVectorDB

Export your Telegram chats:
- Open Telegram Desktop
- Go to Settings → Advanced → Export Telegram data
- Select "Personal chats" and "Machine-readable JSON"
- Save the export as result.json in the project folder
Run the processor:
- Windows: Double-click launch_windows.bat
- Linux: Run ./launch_linux.sh
- First-time setup: Use install_and_run_windows.bat (installs Python automatically)
- The script will automatically install dependencies and process your chats

📁 Output Structure

project/
├── chats_clean_pdf/          # Generated PDF files
├── metadata/                 # Processing metadata
│   └── metadata_summary.json
├── result.json              # Your Telegram export
└── launch_windows.bat       # Easy launcher

⚙️ Configuration

The tool works out-of-the-box, but you can customize settings by creating a .env file:

# Copy the example configuration
cp .env.example .env

# Edit with your settings
notepad .env  # Windows
nano .env     # Linux/macOS

Key settings to configure:

# IMPORTANT: Replace with your actual Telegram data
USER_NAME=Your Actual Telegram Name
USER_ID=user123456789

# PDF optimization (default values work well)
MAX_FILE_SIZE_KB=200
PDF_FONT_SIZE=10
PDF_LINE_SPACING=12

# Chunking algorithm (recommended defaults)
CHUNK_SIZE_SHORT=25
CHUNK_SIZE_MEDIUM=18
CHUNK_SIZE_LONG=12

# Processing options
MIN_MESSAGE_LENGTH=2
SHORT_MESSAGE_THRESHOLD=50
LONG_MESSAGE_THRESHOLD=150

# Debug options
VERBOSE_LOGGING=true
SHOW_PROGRESS=true

🔍 Finding Your User Information

To correctly identify your messages vs received messages:

Open result.json in any text editor
Search for a message you wrote (recognize by your writing style)

Find these fields in your message:

"from": "Your Actual Name",
"from_id": "user123456789"

Copy exact values to your .env file
Test: If messages still show as "From [Name]:" instead of "Me:", check your settings

🔧 Manual Installation

If you prefer manual setup:

# Install Python dependencies
pip install -r requirements.txt

# Run the processor
python process_telegram_chats.py

📊 Features

Optimized for AI: PDFs sized for vector databases (max 200KB by default)
Smart chunking: Dynamic chunk sizing based on message length
Memory efficient: Processes large chats in parts
Clean formatting: Optimized text format for AI processing
Metadata tracking: Complete processing information
Cross-platform: Works on Windows, macOS, and Linux

🤖 n8n Integration

Perfect for n8n workflows:

Text Splitter settings:
- Chunk size: 800
- Overlap: 200
Batch processing: 5-8 files at a time for optimal memory usage
Search patterns:
- Me: for your messages
- From [NAME]: for contact messages
Embedding models:
- OpenAI: text-embedding-ada-002 (1536 dimensions)
- Local: Any 768-dimension model

📋 Processing Statistics

The processor provides detailed statistics:

Total chats processed
Messages per chat
PDF files created
Chunk distribution
Large chats split into multiple parts

🛠️ Requirements

Python 3.7+
Windows/macOS/Linux
Telegram Desktop (for export)

🚨 Troubleshooting

Problem: Messages show as "From [Your Name]:" instead of "Me:"

Solution: Configure your user identification in .env file:

Copy .env.example to .env
Open result.json and find a message you sent
Look for "from": "Your Name" and "from_id": "user123456789"

Update .env with these exact values:

USER_NAME=Your Exact Telegram Name
USER_ID=user123456789

Problem: "Python not found" error

Solutions:

Windows: Use install_and_run_windows.bat (auto-installs Python)
Manual: Download Python from python.org and check "Add to PATH"
Linux: sudo apt install python3 python3-pip

Problem: "No such file 'result.json'"

Solution: Export your Telegram data correctly:

Telegram Desktop → Settings → Advanced → Export Telegram data
Select "Personal chats" and "Machine-readable JSON"
Save as result.json in the project folder

Problem: Large files / GitHub limits

The .gitignore protects against committing:

result.json (your personal chat data)
Generated PDFs (chats_clean_pdf/)
Configuration files (.env)

Problem: Processing fails or crashes

Check if result.json is valid JSON (not corrupted)
Try with VERBOSE_LOGGING=true in .env
Ensure enough disk space (chat exports can be large)
For very large exports, process in smaller batches

🔐 Privacy & Security

Your data stays local - no data is sent anywhere
Git protection - .gitignore prevents accidental data commits
Configuration files - Never commit .env files with personal data
Generated PDFs - Review before sharing (contain your chat history)

📝 License

MIT License - feel free to use and modify!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telegram Chat PDF Processor

🚀 Quick Start (Windows)

📁 Output Structure

⚙️ Configuration

🔍 Finding Your User Information

🔧 Manual Installation

📊 Features

🤖 n8n Integration

📋 Processing Statistics

🛠️ Requirements

🚨 Troubleshooting

Problem: Messages show as "From [Your Name]:" instead of "Me:"

Problem: "Python not found" error

Problem: "No such file 'result.json'"

Problem: Large files / GitHub limits

Problem: Processing fails or crashes

🔐 Privacy & Security

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
install_and_run_ubuntu.sh		install_and_run_ubuntu.sh
install_and_run_windows.bat		install_and_run_windows.bat
launch_linux.sh		launch_linux.sh
launch_windows.bat		launch_windows.bat
process_telegram_chats.py		process_telegram_chats.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Telegram Chat PDF Processor

🚀 Quick Start (Windows)

📁 Output Structure

⚙️ Configuration

🔍 Finding Your User Information

🔧 Manual Installation

📊 Features

🤖 n8n Integration

📋 Processing Statistics

🛠️ Requirements

🚨 Troubleshooting

Problem: Messages show as "From [Your Name]:" instead of "Me:"

Problem: "Python not found" error

Problem: "No such file 'result.json'"

Problem: Large files / GitHub limits

Problem: Processing fails or crashes

🔐 Privacy & Security

📝 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages