This repository contains the source code for three final project implementations. Each project is designed to be deployed as a serverless inference service using Modal.
- Final_Project_1/: First release/implementation of the inference service.
- Final_Project_2/: Second iteration with potential optimizations.
- Final_Project_3/: Final distinct implementation.
Each directory contains a self-contained Modal application with the following key files:
main_system.py: The entry point for the Modal deployment.deploy.sh: Helper script to simplify deployment.config.py: Configuration for models and system settings.unified_worker.py: The core worker logic handling inference tasks.image_setup.py: Definition of the Docker environment and dependencies.
Before running any of the projects, ensure you have the following:
- Modal Account: Sign up at modal.com.
- Modal Client: Install the Modal Python client locally.
pip install modal
- Authentication: Authenticate your local client with your Modal account.
(Or follow the instructions to paste the token if you run
modal token new
modal token set).
You can deploy or serve any of the three projects using the same commands.
Navigate to the desired project directory:
cd Final_Project_1
# OR
cd Final_Project_2
# OR
cd Final_Project_3To deploy the application as a persistent web endpoint:
modal deploy main_system.pyAlternatively, you can use the provided helper script:
./deploy.shUpon successful deployment, Modal will output a URL (e.g., https://<your-username>--inference-project-final-entrypoint.modal.run) that you can use to send requests.
To run the application ephemerally with hot-reloading (useful for testing):
modal serve main_system.pyTo stop a running deployed application, you can use the Modal CLI:
modal app stop inference-project-final(Note: Check the APP_NAME in config.py if it differs).
Each project contains a tests/ directory with scripts to validate the deployment. For example:
# Run a specific test script (ensure you are in the project directory)
python tests/test_system.pyAll projects run in a custom Docker environment defined in image_setup.py, which includes:
- Python 3
- PyTorch 2.4.1 (CUDA 12.1)
- Flash Attention 2
- Transformers, Accelerate, and BitsAndBytes