Skip to content

Fine-tuning e2e integration test#372

Merged
tiffzhao5 merged 8 commits intomainfrom
tiffany/fine-tune-e2e
Nov 15, 2023
Merged

Fine-tuning e2e integration test#372
tiffzhao5 merged 8 commits intomainfrom
tiffany/fine-tune-e2e

Conversation

@tiffzhao5
Copy link
Copy Markdown
Contributor

@tiffzhao5 tiffzhao5 commented Nov 13, 2023

Pull Request Summary

Add e2e integration test for fine-tuning. Also changed the base docker image for loading integration tests onto minikube to python3.8:slim instead since it originally took 10 minutes -- now it only takes 4 min! Let me know if this breaks anything else.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

command: |
docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \
--build-arg BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime \
--build-arg BASE_IMAGE=python:3.8-slim \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, does it mean we could use this for torch images in docker-compose.yml?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I'm not sure actually

command: |
sudo apt-get update && sudo apt-get install -y expect
pushd $HOME/project/.circleci/resources
kubectl create namespace model-engine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is integration test model-engine actually deployed in this namespace?

Copy link
Copy Markdown
Contributor Author

@tiffzhao5 tiffzhao5 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like from this

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug
that the helm chart is installed in the default namespace but from here
endpoint_namespace: model-engine
the endpoints are deployed in the model-engine namespace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug does not specify which namespace to install the chart. it installs a chart defined in ./model-engine folder, with name model-engine

endpoint and model-engine don't need to be in the same namespace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to remove these if this is actually not used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the namespace model-engine is used for endpoints and batch jobs which was why I needed to create a new postgres secret there

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific for finetune integration test? i don't think this blocks any other integration tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the postgres secret is specific because we're actually checking if the batch job gets created in the fine-tune integration test. there's one batch job integration test but it doesn't actually check for a successful creation which is why we didn't need this secret before.

Copy link
Copy Markdown
Contributor

@yunfeng-scale yunfeng-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a bunch for adding this important integration test!

Copy link
Copy Markdown
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

@tiffzhao5 tiffzhao5 enabled auto-merge (squash) November 15, 2023 18:22
@tiffzhao5 tiffzhao5 merged commit 5e4d662 into main Nov 15, 2023
@tiffzhao5 tiffzhao5 deleted the tiffany/fine-tune-e2e branch November 15, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants