Skip to content

abhijeetd05/Multimodal-Metadata-Hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Metadata Hub

Overview

The Multimodal Metadata Hub is a sophisticated application designed for managing and searching through various types of files (images, PDFs, and text documents) using vector embeddings and semantic search capabilities. The system processes uploaded files, extracts text and metadata, generates embeddings using machine learning models, and enables efficient semantic search across the content.

Architecture

The application is built using a modern three-tier architecture:

1. Backend (FastAPI)

  • Built with FastAPI framework for high-performance API endpoints
  • Handles file uploads, processing, and search requests
  • Implements background task processing for file analysis
  • Uses sentence-transformers for generating text and image embeddings
  • Processes multiple file types (images, PDFs, text files)

2. Database (MariaDB)

  • Uses MariaDB 11.8 with vector operations support

  • Stores file metadata, extracted text, and vector embeddings

  • Implements vector indexing for efficient similarity search

  • Supports multiple embedding types:

    • Content embedding (384-dimensional vectors)
    • Visual embedding (512-dimensional vectors)
    • Metadata embedding (512-dimensional vectors)

3. Frontend (Streamlit)

  • Clean and intuitive web interface
  • Supports file upload with progress indication
  • Provides search functionality with relevance scores
  • Real-time status updates and result display

Features

  • File Upload & Processing

    • Support for multiple file formats (JPG, JPEG, PNG, PDF, TXT)
    • Automatic file metadata extraction
    • Background processing with status tracking
    • Secure file storage with UUID-based naming
  • Text & Image Processing

    • Text extraction from PDFs and text files
    • EXIF metadata extraction from images
    • Vector embedding generation using state-of-the-art models
    • Support for both CPU and GPU processing
  • Search Capabilities

    • Semantic search using vector embeddings
    • Configurable search limits
    • Relevance scoring
    • Support for both text and image-based queries
  • Data Management

    • Automatic metadata organization
    • File tagging support
    • Processing status tracking
    • Error handling and reporting

Technical Stack

  • Backend Framework: FastAPI 0.104.1

  • Database: MariaDB 11.8 with vector operations

  • Frontend: Streamlit 1.28.1

  • ML Models:

    • Text: all-MiniLM-L6-v2 (Sentence Transformers)
    • Image: clip-ViT-B-32 (CLIP)
  • Python Dependencies:

    • sentence-transformers ≥ 2.2.2
    • PyTorch
    • Pillow 10.1.0
    • PyMuPDF 1.22.5
    • Other utilities (see requirements.txt)

Setup and Installation

Prerequisites

  • Docker and Docker Compose
  • Git
  • At least 4GB RAM
  • (Optional) NVIDIA GPU with CUDA support

Environment Setup

  1. Clone the repository and navigate to the project directory

  2. Copy the environment template and configure:

    cp .env.example .env
  3. Update the following variables in .env:

    MARIADB_ROOT_PASSWORD=<your-root-password>
    DB_USER=<your-db-user>
    DB_PASSWORD=<your-db-password>
    

Running the Application

  1. Build and start the containers:

    docker-compose up --build
  2. Access the components:

API Endpoints

  • GET /health: System health check
  • POST /upload: File upload endpoint
  • POST /search: Search endpoint with query and limit parameters

Database Schema

Key tables:

  • media_files: Stores file metadata and embeddings
  • file_tags: Manages file tagging and categorization

Media Files Table

Field Type Description
id bigint(20) Primary key, auto-increment
uuid varchar(36) Unique identifier for each file
filename varchar(500) Original filename
original_filename varchar(500) Preserved original name
file_path varchar(1000) Storage path location
file_size bigint(20) File size in bytes
mime_type varchar(200) MIME type classification
file_hash varchar(64) File integrity hash
file_extension varchar(20) File extension
upload_timestamp timestamp Upload time with auto-update
last_modified timestamp Last modification timestamp
extracted_text longtext Extracted text content
document_title varchar(500) Document title metadata
document_author varchar(300) Author information
document_pages int(11) Page count for documents
image_width int(11) Image width in pixels
image_height int(11) Image height in pixels
gps_latitude decimal(10,8) GPS latitude coordinates
gps_longitude decimal(11,8) GPS longitude coordinates
ai_description text AI-generated content description
ai_tags varchar(1000) AI-generated tags
content_embedding vector(1536) Content vector embedding
visual_embedding vector(1536) Visual vector embedding
metadata_embedding vector(512) Metadata vector embedding
processing_status enum Status: pending, processing, completed, failed
processing_error text Error messages if processing fails

File Tags Table

Field Type Description
id bigint(20) Primary key, auto-increment
file_id bigint(20) Foreign key to media_files
tag_name varchar(200) Tag name identifier
tag_value text Tag value content
tag_category varchar(100) Tag categorization
tag_type enum Type: auto, manual, ai_generated, extracted
confidence_score decimal(4,3) Confidence level (0-1.000)
created_at timestamp Tag creation timestamp
updated_at timestamp Tag update timestamp

Configuration Options

Environment Variables

  • TEXT_EMBEDDING_MODEL: Model for text embeddings
  • IMAGE_EMBEDDING_MODEL: Model for image embeddings
  • USE_GPU: Enable/disable GPU acceleration
  • BATCH_SIZE: Processing batch size
  • Database connection parameters
  • API and upload directory settings

Performance Tuning

  • Adjust BATCH_SIZE for optimal processing
  • Configure MariaDB vector index parameters
  • Optimize worker processes for background tasks

Development Notes

Code Organization

src/
├── api/          # FastAPI application
├── database/     # Database connections
├── processing/   # File processing logic
├── ui/           # Streamlit interface
└── utils/        # Shared utilities

Best Practices

  • Use environment variables for configuration
  • Implement proper error handling
  • Follow the provided coding style
  • Add tests for new features

Security Considerations

For production deployment:

  1. Implement authentication
  2. Use HTTPS
  3. Secure database credentials
  4. Validate file uploads
  5. Implement rate limiting
  6. Add proper logging

Troubleshooting

Common issues:

  1. Database connection failures

    • Check credentials in .env
    • Verify MariaDB container is running
  2. Model loading errors

    • Ensure sufficient memory
    • Check GPU configuration if enabled
  3. File processing issues

    • Verify file permissions
    • Check upload directory path
    • Monitor processing logs

Future Potential

The Multimodal Metadata Hub represents a powerful foundation for next‑generation content search and knowledge intelligence. As organizations increasingly rely on unstructured data like images, PDFs, and documents, this platform can evolve into a highly scalable AI‑powered Search‑as‑a‑Service solution. Future enhancements could include distributed vector processing, enterprise‑grade authentication, multilingual model support, and real‑time ingestion pipelines. Businesses, SaaS platforms, content creators, and developers could seamlessly integrate this API to enable intelligent content discovery, automated knowledge extraction, and context‑aware search capabilities within their products and workflows. As vector databases and generative AI continue to grow, this prototype can mature into a full‑fledged commercial platform for secure, multimodal enterprise search and AI‑driven data intelligence.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit changes
  4. Push to the branch
  5. Create a Pull Request

Authors

Team Singleton : Sunny Kumar , Anuj Gupta , Abhijeet Dhanotiya , Anand Vyas and Tushan Kumar Sinha - Initial work and maintenance "# Multimodal-Metadata-Hub"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •