Inference Services

This repository contains the source code for three final project implementations. Each project is designed to be deployed as a serverless inference service using Modal.

Project Structure

Final_Project_1/: First release/implementation of the inference service.
Final_Project_2/: Second iteration with potential optimizations.
Final_Project_3/: Final distinct implementation.

Each directory contains a self-contained Modal application with the following key files:

main_system.py: The entry point for the Modal deployment.
deploy.sh: Helper script to simplify deployment.
config.py: Configuration for models and system settings.
unified_worker.py: The core worker logic handling inference tasks.
image_setup.py: Definition of the Docker environment and dependencies.

Prerequisites

Before running any of the projects, ensure you have the following:

Modal Account: Sign up at modal.com.
Modal Client: Install the Modal Python client locally.
```
pip install modal
```
Authentication: Authenticate your local client with your Modal account.
```
modal token new
```
(Or follow the instructions to paste the token if you run modal token set).

How to Run

You can deploy or serve any of the three projects using the same commands.

1. specific Project

Navigate to the desired project directory:

cd Final_Project_1
# OR
cd Final_Project_2
# OR
cd Final_Project_3

2. Deploy (Production)

To deploy the application as a persistent web endpoint:

modal deploy main_system.py

Alternatively, you can use the provided helper script:

./deploy.sh

Upon successful deployment, Modal will output a URL (e.g., https://<your-username>--inference-project-final-entrypoint.modal.run) that you can use to send requests.

3. Serve (Development/Testing)

To run the application ephemerally with hot-reloading (useful for testing):

modal serve main_system.py

Stopping the App

To stop a running deployed application, you can use the Modal CLI:

modal app stop inference-project-final

(Note: Check the APP_NAME in config.py if it differs).

Testing

Each project contains a tests/ directory with scripts to validate the deployment. For example:

# Run a specific test script (ensure you are in the project directory)
python tests/test_system.py

Environment Details

All projects run in a custom Docker environment defined in image_setup.py, which includes:

Python 3
PyTorch 2.4.1 (CUDA 12.1)
Flash Attention 2
Transformers, Accelerate, and BitsAndBytes

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Final_Project_1		Final_Project_1
Final_Project_2		Final_Project_2
Final_Project_3		Final_Project_3
.gitignore		.gitignore
README.md		README.md
writeup.pdf		writeup.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inference Services

Project Structure

Prerequisites

How to Run

1. specific Project

2. Deploy (Production)

3. Serve (Development/Testing)

Stopping the App

Testing

Environment Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inference Services

Project Structure

Prerequisites

How to Run

1. specific Project

2. Deploy (Production)

3. Serve (Development/Testing)

Stopping the App

Testing

Environment Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages