Conversation
| command: | | ||
| docker build -f model-engine/model_engine_server/inference/pytorch_or_tf.base.Dockerfile \ | ||
| --build-arg BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime \ | ||
| --build-arg BASE_IMAGE=python:3.8-slim \ |
There was a problem hiding this comment.
interesting, does it mean we could use this for torch images in docker-compose.yml?
There was a problem hiding this comment.
hm I'm not sure actually
| command: | | ||
| sudo apt-get update && sudo apt-get install -y expect | ||
| pushd $HOME/project/.circleci/resources | ||
| kubectl create namespace model-engine |
There was a problem hiding this comment.
is integration test model-engine actually deployed in this namespace?
There was a problem hiding this comment.
seems like from this
llm-engine/.circleci/config.yml
Line 164 in 4e2ea6c
There was a problem hiding this comment.
helm install model-engine model-engine --values model-engine/values_circleci_subst.yaml --set tag=$CIRCLE_SHA1 --atomic --debug does not specify which namespace to install the chart. it installs a chart defined in ./model-engine folder, with name model-engine
endpoint and model-engine don't need to be in the same namespace
There was a problem hiding this comment.
can you try to remove these if this is actually not used?
There was a problem hiding this comment.
actually the namespace model-engine is used for endpoints and batch jobs which was why I needed to create a new postgres secret there
There was a problem hiding this comment.
is this specific for finetune integration test? i don't think this blocks any other integration tests
There was a problem hiding this comment.
the postgres secret is specific because we're actually checking if the batch job gets created in the fine-tune integration test. there's one batch job integration test but it doesn't actually check for a successful creation which is why we didn't need this secret before.
yunfeng-scale
left a comment
There was a problem hiding this comment.
thanks a bunch for adding this important integration test!
Pull Request Summary
Add e2e integration test for fine-tuning. Also changed the base docker image for loading integration tests onto minikube to
python3.8:sliminstead since it originally took 10 minutes -- now it only takes 4 min! Let me know if this breaks anything else.Test Plan and Usage Guide
How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.