A Matrix message archival and analysis tool with microservices architecture.
graph TB
A[Matrix Server] -->|Events| B[Matrix Bot Service]
B -->|Save Messages| C[PostgreSQL DB]
B -->|Store Media| H[MinIO Storage]
D[API Clients] -->|HTTP Requests| E[FastAPI API Service]
E -->|Query| C
C -->|Results| E
E -->|Download Media| H
E -->|Response| D
F[Analysis Engine] -->|Process Data| C
E -->|Request Analysis| F
F -->|Analysis Results| E
G[AI Models] -->|Sentiment/Topic Analysis| F
style A fill:#f9f,stroke:#333
style B fill:#bbf,stroke:#333
style C fill:#dfd,stroke:#333
style E fill:#bfb,stroke:#333
style F fill:#bff,stroke:#333
style G fill:#fbf,stroke:#333
style H fill:#fdb,stroke:#333
Matrix Historian now uses a microservices architecture with the following components:
- Bot Service (
services/bot/): Connects to Matrix and archives messages to PostgreSQL, downloads and stores media files - API Service (
services/api/): FastAPI REST API for querying messages, analytics, and media - Shared Package (
shared/): Common code (models, schemas, database, CRUD operations, storage utilities) - PostgreSQL Database: Centralized data storage for messages and metadata
- MinIO Object Storage: S3-compatible storage for media files (images, videos, audio, files)
- Automatically records Matrix room message history
- Supports message search by room, user, and content
- RESTful API for message browsing and searching
- Docker-based microservices deployment
- PostgreSQL database for scalable storage
- Media storage with MinIO for images, videos, audio, and files
- Automatic media archival: Bot automatically downloads and stores media files from Matrix
- MinIO object storage: S3-compatible storage for efficient media management
- Media metadata tracking: Store filename, MIME type, file size, and image dimensions
- RESTful media API: Query media by room, user, or type
- Presigned URLs: Secure temporary download links for media files
- Media statistics: Track total media count, size, and breakdown by type
- Filtered queries: Search media by MIME type (e.g., images only)
- Activity Overview: Displays message trends and user activity levels
- Word Cloud Analysis: Generates statistics and visualizations of word frequency in chats
- User Interaction: Shows the network and intensity of interactions between users
- Topic Analysis: Tracks the evolution of topics over time
- Sentiment Analysis: AI-based analysis of message sentiment tendencies (requires GROQ_API_KEY)
- Activity Analysis: Displays heatmaps of group activity during different times
All analysis features support filtering by time range and room.
- Docker and Docker Compose
- A Matrix account for the bot
- (Optional) GROQ API key for AI analysis features
- Clone the repository
git clone https://github.com/EnsueCollectR/matrix-historian.git
cd matrix-historian- Configure environment variables
cp .env.example .env
# Edit .env file to set Matrix bot credentials and other configurationRequired environment variables:
MATRIX_HOMESERVER: Your Matrix homeserver URL (e.g., https://matrix.org)MATRIX_USER: Bot username (e.g., @yourbot:matrix.org)MATRIX_PASSWORD: Bot passwordGROQ_API_KEY: (Optional) For AI sentiment analysis
- Start the services
docker-compose up -d- Check service status
docker-compose ps
docker-compose logs -fServices will start on the following ports:
- API service: http://localhost:8500 (configurable via
API_PORT) - API documentation: http://localhost:8500/docs (Swagger UI)
- MinIO Console: http://localhost:9001 (web UI for managing media storage)
- MinIO API: http://localhost:9000 (S3-compatible API endpoint)
The application automatically creates database tables on startup. For production deployments, consider using Alembic for migrations (see Development section).
| Variable | Description | Default | Required |
|---|---|---|---|
MATRIX_HOMESERVER |
Matrix server address | - | Yes |
MATRIX_USER |
Bot username | - | Yes |
MATRIX_PASSWORD |
Bot password | - | Yes |
DATABASE_URL |
PostgreSQL connection string | postgresql://historian:historian@db:5432/historian |
No |
MINIO_ROOT_USER |
MinIO admin username | historian |
No |
MINIO_ROOT_PASSWORD |
MinIO admin password | historian123 |
No |
MINIO_ENDPOINT |
MinIO endpoint | minio:9000 |
No |
MINIO_BUCKET |
MinIO bucket name | matrix-media |
No |
MINIO_API_PORT |
MinIO API port | 9000 |
No |
MINIO_CONSOLE_PORT |
MinIO console port | 9001 |
No |
API_PORT |
API service port | 8500 |
No |
GROQ_API_KEY |
API key for AI analysis | - | No |
Matrix Historian uses MinIO for storing media files uploaded to Matrix rooms. MinIO is an S3-compatible object storage system that runs as a Docker container.
The MinIO web console is available at http://localhost:9001
Default credentials:
- Username:
historian - Password:
historian123
From the console, you can:
- Browse stored media files
- Monitor storage usage
- Manage buckets and policies
- View access logs
- Bot receives media event from Matrix (image, video, audio, or file)
- Bot downloads media from the Matrix server using the
mxc://URL - Bot uploads to MinIO with a unique UUID-based key
- Metadata saved to PostgreSQL including filename, MIME type, size, dimensions
- API provides download URLs via presigned URLs that expire after 1 hour
- Images:
m.image(JPEG, PNG, GIF, WebP, etc.) - Videos:
m.video(MP4, WebM, etc.) - Audio:
m.audio(MP3, OGG, etc.) - Files:
m.file(any file type)
Media files are stored in MinIO with the following structure:
matrix-media/ (bucket)
├── <uuid>/
│ └── original_filename.ext
Each file is stored under a unique UUID path to prevent naming conflicts.
The API service provides RESTful endpoints for querying messages, analytics, and media.
Interactive Documentation: http://localhost:8500/docs
GET /api/v1/messages- List messages with paginationGET /api/v1/messages/search- Search messages by contentGET /api/v1/rooms- List roomsGET /api/v1/users- List users
GET /api/v1/analytics/overview- Get analytics overviewGET /api/v1/analytics/trends- Get message trendsGET /api/v1/analytics/activity-heatmap- Get activity heatmap
GET /api/v1/media/- List all media with paginationGET /api/v1/media/stats- Get media statistics (count, size, by type)GET /api/v1/media/room/{room_id}- List media in a specific roomGET /api/v1/media/user/{user_id}- List media sent by a specific userGET /api/v1/media/{media_id}- Get media metadata with download URLGET /api/v1/media/{media_id}/download- Download media file
Media query parameters:
skip- Pagination offsetlimit- Number of results (default: 100)mime_type- Filter by MIME type prefix (e.g.,image/for images only)
See the API documentation for complete endpoint details.
matrix-historian/
├── docker-compose.yml # Multi-service orchestration
├── .env.example # Environment variables template
├── shared/ # Shared code package
│ ├── app/
│ │ ├── models/ # SQLAlchemy models
│ │ ├── schemas/ # Pydantic schemas
│ │ ├── crud/ # Database operations
│ │ ├── storage/ # Storage utilities (MinIO client)
│ │ ├── db/ # Database configuration
│ │ └── utils/ # Utility functions
│ └── setup.py
├── services/
│ ├── bot/ # Matrix bot service
│ │ ├── Dockerfile
│ │ ├── requirements.txt
│ │ └── app/
│ │ ├── main.py
│ │ └── bot.py
│ └── api/ # FastAPI service
│ ├── Dockerfile
│ ├── requirements.txt
│ └── app/
│ ├── main.py
│ ├── api/ # API routes (messages, analytics, media)
│ └── ai/ # AI analysis
└── docs/ # Documentation
For local development without Docker:
- Install PostgreSQL
# On Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib
# Create database and user
sudo -u postgres psql
CREATE DATABASE historian;
CREATE USER historian WITH PASSWORD 'historian';
GRANT ALL PRIVILEGES ON DATABASE historian TO historian;- Install Python dependencies
# Install shared package
cd shared
pip install -e .
# Install bot service dependencies
cd ../services/bot
pip install -r requirements.txt
# Install API service dependencies
cd ../services/api
pip install -r requirements.txt- Set environment variables
export DATABASE_URL="postgresql://historian:historian@localhost:5432/historian"
export MATRIX_HOMESERVER="https://matrix.org"
export MATRIX_USER="@yourbot:matrix.org"
export MATRIX_PASSWORD="your_password"- Run services
# Terminal 1: Run bot service
cd services/bot/app
python main.py
# Terminal 2: Run API service
cd services/api/app
uvicorn main:app --reload --port 8000For production deployments, use Alembic for database migrations:
# Install Alembic
pip install alembic
# Initialize Alembic in shared package
cd shared
alembic init alembic
# Edit alembic.ini and set sqlalchemy.url
# Edit alembic/env.py to import Base from app.db.database
# Create initial migration
alembic revision --autogenerate -m "Initial migration"
# Apply migrations
alembic upgrade head# Run tests
cd tests
pytestThis version represents a significant architectural change from the monolithic application:
- SQLite → PostgreSQL: All data must be migrated to PostgreSQL
- Monolith → Microservices: Bot and API now run as separate services
- Frontend Removed: The web UI has been removed; use the API directly or build your own frontend
- Database format changed from SQLite to PostgreSQL
- Configuration now uses environment variables exclusively
- Bot and API run as separate processes
func.date_truncandfunc.extractqueries now work correctly with PostgreSQL
- Check
MATRIX_HOMESERVER,MATRIX_USER, andMATRIX_PASSWORDin.env - View bot logs:
docker-compose logs bot
- Ensure PostgreSQL is healthy:
docker-compose ps db - Check API logs:
docker-compose logs api
- Verify
DATABASE_URLis correct - Ensure the
dbservice is running and healthy - Check PostgreSQL logs:
docker-compose logs db
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- ✨ MinIO object storage integration for media files
- ✨ Automatic media archival from Matrix (images, videos, audio, files)
- ✨ Media API endpoints for querying and downloading media
- ✨ Media metadata tracking (filename, MIME type, size, dimensions)
- ✨ Presigned download URLs for secure media access
- ✨ Media statistics and filtering by type
- 🔧 Enhanced bot to handle media events
- 🔧 MinIO health checks and dependencies in docker-compose
- 📝 Updated documentation with media storage guide
- ✨ Migrated to microservices architecture
- ✨ PostgreSQL database support (replaces SQLite)
- ✨ Separate bot and API services
- ✨ Shared package for common code
- ✨ Docker Compose orchestration
- ✨ Fixed
func.date_truncPostgreSQL compatibility issues - ✨ Consolidated bot initialization (removed duplicate patterns)
- 🔧 Improved error handling and logging
- 🔧 Health checks for all services
- 📝 Updated documentation
- Initial monolithic application
- SQLite database
- Combined bot + API in single process
中文文档 (Note: Chinese documentation may be outdated)