Multimodal Metadata Hub

Overview

The Multimodal Metadata Hub is a sophisticated application designed for managing and searching through various types of files (images, PDFs, and text documents) using vector embeddings and semantic search capabilities. The system processes uploaded files, extracts text and metadata, generates embeddings using machine learning models, and enables efficient semantic search across the content.

Architecture

The application is built using a modern three-tier architecture:

1. Backend (FastAPI)

Built with FastAPI framework for high-performance API endpoints
Handles file uploads, processing, and search requests
Implements background task processing for file analysis
Uses sentence-transformers for generating text and image embeddings
Processes multiple file types (images, PDFs, text files)

2. Database (MariaDB)

Uses MariaDB 11.8 with vector operations support
Stores file metadata, extracted text, and vector embeddings
Implements vector indexing for efficient similarity search
Supports multiple embedding types:
- Content embedding (384-dimensional vectors)
- Visual embedding (512-dimensional vectors)
- Metadata embedding (512-dimensional vectors)

3. Frontend (Streamlit)

Clean and intuitive web interface
Supports file upload with progress indication
Provides search functionality with relevance scores
Real-time status updates and result display

Features

File Upload & Processing
- Support for multiple file formats (JPG, JPEG, PNG, PDF, TXT)
- Automatic file metadata extraction
- Background processing with status tracking
- Secure file storage with UUID-based naming
Text & Image Processing
- Text extraction from PDFs and text files
- EXIF metadata extraction from images
- Vector embedding generation using state-of-the-art models
- Support for both CPU and GPU processing
Search Capabilities
- Semantic search using vector embeddings
- Configurable search limits
- Relevance scoring
- Support for both text and image-based queries
Data Management
- Automatic metadata organization
- File tagging support
- Processing status tracking
- Error handling and reporting

Technical Stack

Backend Framework: FastAPI 0.104.1
Database: MariaDB 11.8 with vector operations
Frontend: Streamlit 1.28.1
ML Models:
- Text: all-MiniLM-L6-v2 (Sentence Transformers)
- Image: clip-ViT-B-32 (CLIP)
Python Dependencies:
- sentence-transformers ≥ 2.2.2
- PyTorch
- Pillow 10.1.0
- PyMuPDF 1.22.5
- Other utilities (see requirements.txt)

Setup and Installation

Prerequisites

Docker and Docker Compose
Git
At least 4GB RAM
(Optional) NVIDIA GPU with CUDA support

Environment Setup

Clone the repository and navigate to the project directory
Copy the environment template and configure:
```
cp .env.example .env
```

Update the following variables in .env:

MARIADB_ROOT_PASSWORD=<your-root-password>
DB_USER=<your-db-user>
DB_PASSWORD=<your-db-password>

Running the Application

Build and start the containers:
```
docker-compose up --build
```
Access the components:
- Frontend UI: http://localhost:8501
- API Documentation: http://localhost:8000/docs
- Database: localhost:3306

API Endpoints

GET /health: System health check
POST /upload: File upload endpoint
POST /search: Search endpoint with query and limit parameters

Database Schema

Key tables:

media_files: Stores file metadata and embeddings
file_tags: Manages file tagging and categorization

Media Files Table

Field	Type	Description
id	bigint(20)	Primary key, auto-increment
uuid	varchar(36)	Unique identifier for each file
filename	varchar(500)	Original filename
original_filename	varchar(500)	Preserved original name
file_path	varchar(1000)	Storage path location
file_size	bigint(20)	File size in bytes
mime_type	varchar(200)	MIME type classification
file_hash	varchar(64)	File integrity hash
file_extension	varchar(20)	File extension
upload_timestamp	timestamp	Upload time with auto-update
last_modified	timestamp	Last modification timestamp
extracted_text	longtext	Extracted text content
document_title	varchar(500)	Document title metadata
document_author	varchar(300)	Author information
document_pages	int(11)	Page count for documents
image_width	int(11)	Image width in pixels
image_height	int(11)	Image height in pixels
gps_latitude	decimal(10,8)	GPS latitude coordinates
gps_longitude	decimal(11,8)	GPS longitude coordinates
ai_description	text	AI-generated content description
ai_tags	varchar(1000)	AI-generated tags
content_embedding	vector(1536)	Content vector embedding
visual_embedding	vector(1536)	Visual vector embedding
metadata_embedding	vector(512)	Metadata vector embedding
processing_status	enum	Status: pending, processing, completed, failed
processing_error	text	Error messages if processing fails

File Tags Table

Field	Type	Description
id	bigint(20)	Primary key, auto-increment
file_id	bigint(20)	Foreign key to media_files
tag_name	varchar(200)	Tag name identifier
tag_value	text	Tag value content
tag_category	varchar(100)	Tag categorization
tag_type	enum	Type: auto, manual, ai_generated, extracted
confidence_score	decimal(4,3)	Confidence level (0-1.000)
created_at	timestamp	Tag creation timestamp
updated_at	timestamp	Tag update timestamp

Configuration Options

Environment Variables

TEXT_EMBEDDING_MODEL: Model for text embeddings
IMAGE_EMBEDDING_MODEL: Model for image embeddings
USE_GPU: Enable/disable GPU acceleration
BATCH_SIZE: Processing batch size
Database connection parameters
API and upload directory settings

Performance Tuning

Adjust BATCH_SIZE for optimal processing
Configure MariaDB vector index parameters
Optimize worker processes for background tasks

Development Notes

Code Organization

src/
├── api/          # FastAPI application
├── database/     # Database connections
├── processing/   # File processing logic
├── ui/           # Streamlit interface
└── utils/        # Shared utilities

Best Practices

Use environment variables for configuration
Implement proper error handling
Follow the provided coding style
Add tests for new features

Security Considerations

For production deployment:

Implement authentication
Use HTTPS
Secure database credentials
Validate file uploads
Implement rate limiting
Add proper logging

Troubleshooting

Common issues:

Database connection failures
- Check credentials in .env
- Verify MariaDB container is running
Model loading errors
- Ensure sufficient memory
- Check GPU configuration if enabled
File processing issues
- Verify file permissions
- Check upload directory path
- Monitor processing logs

Future Potential

The Multimodal Metadata Hub represents a powerful foundation for next‑generation content search and knowledge intelligence. As organizations increasingly rely on unstructured data like images, PDFs, and documents, this platform can evolve into a highly scalable AI‑powered Search‑as‑a‑Service solution. Future enhancements could include distributed vector processing, enterprise‑grade authentication, multilingual model support, and real‑time ingestion pipelines. Businesses, SaaS platforms, content creators, and developers could seamlessly integrate this API to enable intelligent content discovery, automated knowledge extraction, and context‑aware search capabilities within their products and workflows. As vector databases and generative AI continue to grow, this prototype can mature into a full‑fledged commercial platform for secure, multimodal enterprise search and AI‑driven data intelligence.

Contributing

Fork the repository
Create a feature branch
Commit changes
Push to the branch
Create a Pull Request

Authors

Team Singleton : Sunny Kumar , Anuj Gupta , Abhijeet Dhanotiya , Anand Vyas and Tushan Kumar Sinha - Initial work and maintenance "# Multimodal-Metadata-Hub"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
api		api
data		data
src		src
ui		ui
.dockerignore		.dockerignore
.env		.env
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Metadata Hub

Overview

Architecture

1. Backend (FastAPI)

2. Database (MariaDB)

3. Frontend (Streamlit)

Features

Technical Stack

Setup and Installation

Prerequisites

Environment Setup

Running the Application

API Endpoints

Database Schema

Media Files Table

File Tags Table

Configuration Options

Environment Variables

Performance Tuning

Development Notes

Code Organization

Best Practices

Security Considerations

Troubleshooting

Future Potential

Contributing

Authors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

abhijeetd05/Multimodal-Metadata-Hub

Folders and files

Latest commit

History

Repository files navigation

Multimodal Metadata Hub

Overview

Architecture

1. Backend (FastAPI)

2. Database (MariaDB)

3. Frontend (Streamlit)

Features

Technical Stack

Setup and Installation

Prerequisites

Environment Setup

Running the Application

API Endpoints

Database Schema

Media Files Table

File Tags Table

Configuration Options

Environment Variables

Performance Tuning

Development Notes

Code Organization

Best Practices

Security Considerations

Troubleshooting

Future Potential

Contributing

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages