The Multimodal Metadata Hub is a sophisticated application designed for managing and searching through various types of files (images, PDFs, and text documents) using vector embeddings and semantic search capabilities. The system processes uploaded files, extracts text and metadata, generates embeddings using machine learning models, and enables efficient semantic search across the content.
The application is built using a modern three-tier architecture:
- Built with FastAPI framework for high-performance API endpoints
- Handles file uploads, processing, and search requests
- Implements background task processing for file analysis
- Uses sentence-transformers for generating text and image embeddings
- Processes multiple file types (images, PDFs, text files)
-
Uses MariaDB 11.8 with vector operations support
-
Stores file metadata, extracted text, and vector embeddings
-
Implements vector indexing for efficient similarity search
-
Supports multiple embedding types:
- Content embedding (384-dimensional vectors)
- Visual embedding (512-dimensional vectors)
- Metadata embedding (512-dimensional vectors)
- Clean and intuitive web interface
- Supports file upload with progress indication
- Provides search functionality with relevance scores
- Real-time status updates and result display
-
File Upload & Processing
- Support for multiple file formats (JPG, JPEG, PNG, PDF, TXT)
- Automatic file metadata extraction
- Background processing with status tracking
- Secure file storage with UUID-based naming
-
Text & Image Processing
- Text extraction from PDFs and text files
- EXIF metadata extraction from images
- Vector embedding generation using state-of-the-art models
- Support for both CPU and GPU processing
-
Search Capabilities
- Semantic search using vector embeddings
- Configurable search limits
- Relevance scoring
- Support for both text and image-based queries
-
Data Management
- Automatic metadata organization
- File tagging support
- Processing status tracking
- Error handling and reporting
-
Backend Framework: FastAPI 0.104.1
-
Database: MariaDB 11.8 with vector operations
-
Frontend: Streamlit 1.28.1
-
ML Models:
- Text: all-MiniLM-L6-v2 (Sentence Transformers)
- Image: clip-ViT-B-32 (CLIP)
-
Python Dependencies:
- sentence-transformers ≥ 2.2.2
- PyTorch
- Pillow 10.1.0
- PyMuPDF 1.22.5
- Other utilities (see requirements.txt)
- Docker and Docker Compose
- Git
- At least 4GB RAM
- (Optional) NVIDIA GPU with CUDA support
-
Clone the repository and navigate to the project directory
-
Copy the environment template and configure:
cp .env.example .env
-
Update the following variables in
.env:MARIADB_ROOT_PASSWORD=<your-root-password> DB_USER=<your-db-user> DB_PASSWORD=<your-db-password>
-
Build and start the containers:
docker-compose up --build
-
Access the components:
- Frontend UI: http://localhost:8501
- API Documentation: http://localhost:8000/docs
- Database: localhost:3306
GET /health: System health checkPOST /upload: File upload endpointPOST /search: Search endpoint with query and limit parameters
Key tables:
media_files: Stores file metadata and embeddingsfile_tags: Manages file tagging and categorization
| Field | Type | Description |
|---|---|---|
| id | bigint(20) | Primary key, auto-increment |
| uuid | varchar(36) | Unique identifier for each file |
| filename | varchar(500) | Original filename |
| original_filename | varchar(500) | Preserved original name |
| file_path | varchar(1000) | Storage path location |
| file_size | bigint(20) | File size in bytes |
| mime_type | varchar(200) | MIME type classification |
| file_hash | varchar(64) | File integrity hash |
| file_extension | varchar(20) | File extension |
| upload_timestamp | timestamp | Upload time with auto-update |
| last_modified | timestamp | Last modification timestamp |
| extracted_text | longtext | Extracted text content |
| document_title | varchar(500) | Document title metadata |
| document_author | varchar(300) | Author information |
| document_pages | int(11) | Page count for documents |
| image_width | int(11) | Image width in pixels |
| image_height | int(11) | Image height in pixels |
| gps_latitude | decimal(10,8) | GPS latitude coordinates |
| gps_longitude | decimal(11,8) | GPS longitude coordinates |
| ai_description | text | AI-generated content description |
| ai_tags | varchar(1000) | AI-generated tags |
| content_embedding | vector(1536) | Content vector embedding |
| visual_embedding | vector(1536) | Visual vector embedding |
| metadata_embedding | vector(512) | Metadata vector embedding |
| processing_status | enum | Status: pending, processing, completed, failed |
| processing_error | text | Error messages if processing fails |
| Field | Type | Description |
|---|---|---|
| id | bigint(20) | Primary key, auto-increment |
| file_id | bigint(20) | Foreign key to media_files |
| tag_name | varchar(200) | Tag name identifier |
| tag_value | text | Tag value content |
| tag_category | varchar(100) | Tag categorization |
| tag_type | enum | Type: auto, manual, ai_generated, extracted |
| confidence_score | decimal(4,3) | Confidence level (0-1.000) |
| created_at | timestamp | Tag creation timestamp |
| updated_at | timestamp | Tag update timestamp |
TEXT_EMBEDDING_MODEL: Model for text embeddingsIMAGE_EMBEDDING_MODEL: Model for image embeddingsUSE_GPU: Enable/disable GPU accelerationBATCH_SIZE: Processing batch size- Database connection parameters
- API and upload directory settings
- Adjust
BATCH_SIZEfor optimal processing - Configure MariaDB vector index parameters
- Optimize worker processes for background tasks
src/
├── api/ # FastAPI application
├── database/ # Database connections
├── processing/ # File processing logic
├── ui/ # Streamlit interface
└── utils/ # Shared utilities
- Use environment variables for configuration
- Implement proper error handling
- Follow the provided coding style
- Add tests for new features
For production deployment:
- Implement authentication
- Use HTTPS
- Secure database credentials
- Validate file uploads
- Implement rate limiting
- Add proper logging
Common issues:
-
Database connection failures
- Check credentials in .env
- Verify MariaDB container is running
-
Model loading errors
- Ensure sufficient memory
- Check GPU configuration if enabled
-
File processing issues
- Verify file permissions
- Check upload directory path
- Monitor processing logs
The Multimodal Metadata Hub represents a powerful foundation for next‑generation content search and knowledge intelligence. As organizations increasingly rely on unstructured data like images, PDFs, and documents, this platform can evolve into a highly scalable AI‑powered Search‑as‑a‑Service solution. Future enhancements could include distributed vector processing, enterprise‑grade authentication, multilingual model support, and real‑time ingestion pipelines. Businesses, SaaS platforms, content creators, and developers could seamlessly integrate this API to enable intelligent content discovery, automated knowledge extraction, and context‑aware search capabilities within their products and workflows. As vector databases and generative AI continue to grow, this prototype can mature into a full‑fledged commercial platform for secure, multimodal enterprise search and AI‑driven data intelligence.
- Fork the repository
- Create a feature branch
- Commit changes
- Push to the branch
- Create a Pull Request
Team Singleton : Sunny Kumar , Anuj Gupta , Abhijeet Dhanotiya , Anand Vyas and Tushan Kumar Sinha - Initial work and maintenance "# Multimodal-Metadata-Hub"