Skip to content

nikJ13/Inference-Engine-for-Heterogeneous-LLM-Workloads

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inference Services

This repository contains the source code for three final project implementations. Each project is designed to be deployed as a serverless inference service using Modal.

Project Structure

  • Final_Project_1/: First release/implementation of the inference service.
  • Final_Project_2/: Second iteration with potential optimizations.
  • Final_Project_3/: Final distinct implementation.

Each directory contains a self-contained Modal application with the following key files:

  • main_system.py: The entry point for the Modal deployment.
  • deploy.sh: Helper script to simplify deployment.
  • config.py: Configuration for models and system settings.
  • unified_worker.py: The core worker logic handling inference tasks.
  • image_setup.py: Definition of the Docker environment and dependencies.

Prerequisites

Before running any of the projects, ensure you have the following:

  1. Modal Account: Sign up at modal.com.
  2. Modal Client: Install the Modal Python client locally.
    pip install modal
  3. Authentication: Authenticate your local client with your Modal account.
    modal token new
    (Or follow the instructions to paste the token if you run modal token set).

How to Run

You can deploy or serve any of the three projects using the same commands.

1. specific Project

Navigate to the desired project directory:

cd Final_Project_1
# OR
cd Final_Project_2
# OR
cd Final_Project_3

2. Deploy (Production)

To deploy the application as a persistent web endpoint:

modal deploy main_system.py

Alternatively, you can use the provided helper script:

./deploy.sh

Upon successful deployment, Modal will output a URL (e.g., https://<your-username>--inference-project-final-entrypoint.modal.run) that you can use to send requests.

3. Serve (Development/Testing)

To run the application ephemerally with hot-reloading (useful for testing):

modal serve main_system.py

Stopping the App

To stop a running deployed application, you can use the Modal CLI:

modal app stop inference-project-final

(Note: Check the APP_NAME in config.py if it differs).

Testing

Each project contains a tests/ directory with scripts to validate the deployment. For example:

# Run a specific test script (ensure you are in the project directory)
python tests/test_system.py

Environment Details

All projects run in a custom Docker environment defined in image_setup.py, which includes:

  • Python 3
  • PyTorch 2.4.1 (CUDA 12.1)
  • Flash Attention 2
  • Transformers, Accelerate, and BitsAndBytes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors