Fine-tuning e2e integration test by tiffzhao5 · Pull Request #372 · scaleapi/llm-engine

tiffzhao5 · 2023-11-13T19:35:15Z

Pull Request Summary

Add e2e integration test for fine-tuning. Also changed the base docker image for loading integration tests onto minikube to python3.8:slim instead since it originally took 10 minutes -- now it only takes 4 min! Let me know if this breaks anything else.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

yunfeng-scale · 2023-11-14T22:19:14Z

.circleci/config.yml

          command: |
            docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \
-            --build-arg BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime \
+            --build-arg BASE_IMAGE=python:3.8-slim \


interesting, does it mean we could use this for torch images in docker-compose.yml?

hm I'm not sure actually

yunfeng-scale · 2023-11-14T22:21:13Z

.circleci/config.yml

          command: |
            sudo apt-get update && sudo apt-get install -y expect
            pushd $HOME/project/.circleci/resources
+            kubectl create namespace model-engine


is integration test model-engine actually deployed in this namespace?

seems like from this

llm-engine/.circleci/config.yml

Line 164 in 4e2ea6c

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug

that the helm chart is installed in the default namespace but from here

llm-engine/charts/model-engine/values_circleci.yaml

Line 97 in 7ef9723

endpoint_namespace: model-engine

the endpoints are deployed in the model-engine namespace

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug does not specify which namespace to install the chart. it installs a chart defined in ./model-engine folder, with name model-engine

endpoint and model-engine don't need to be in the same namespace

can you try to remove these if this is actually not used?

actually the namespace model-engine is used for endpoints and batch jobs which was why I needed to create a new postgres secret there

is this specific for finetune integration test? i don't think this blocks any other integration tests

the postgres secret is specific because we're actually checking if the batch job gets created in the fine-tune integration test. there's one batch job integration test but it doesn't actually check for a successful creation which is why we didn't need this secret before.

yunfeng-scale

thanks a bunch for adding this important integration test!

yixu34

Thanks for adding this!

tiffzhao5 added 7 commits November 13, 2023 17:12

make test work

f71f441

add status checking

998ce60

fix

7382423

test

bb94ae0

wget fix

e780b88

final fixes

e33c29a

move namespace

7ef9723

tiffzhao5 requested review from yixu34 and yunfeng-scale November 14, 2023 18:10

yunfeng-scale reviewed Nov 14, 2023

View reviewed changes

yixu34 approved these changes Nov 15, 2023

View reviewed changes

Merge branch 'main' into tiffany/fine-tune-e2e

9bae3aa

tiffzhao5 enabled auto-merge (squash) November 15, 2023 18:22

tiffzhao5 merged commit 5e4d662 into main Nov 15, 2023

tiffzhao5 deleted the tiffany/fine-tune-e2e branch November 15, 2023 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning e2e integration test#372

Fine-tuning e2e integration test#372
tiffzhao5 merged 8 commits intomainfrom
tiffany/fine-tune-e2e

tiffzhao5 commented Nov 13, 2023 •

edited

Loading

Uh oh!

yunfeng-scale Nov 14, 2023

Uh oh!

tiffzhao5 Nov 15, 2023

Uh oh!

yunfeng-scale Nov 14, 2023

Uh oh!

tiffzhao5 Nov 15, 2023 •

edited

Loading

Uh oh!

yunfeng-scale Nov 15, 2023

Uh oh!

yunfeng-scale Nov 15, 2023

Uh oh!

tiffzhao5 Nov 15, 2023

Uh oh!

yunfeng-scale Nov 15, 2023

Uh oh!

tiffzhao5 Nov 16, 2023

Uh oh!

yunfeng-scale left a comment

Uh oh!

yixu34 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tiffzhao5 commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiffzhao5 Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale left a comment

Choose a reason for hiding this comment

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tiffzhao5 commented Nov 13, 2023 •

edited

Loading

tiffzhao5 Nov 15, 2023 •

edited

Loading