Skip to content

docker-compose config not always considered #166

@updiversity

Description

@updiversity

When running an embedding model via Docker Model Runner, the runtime configuration (--ubatch-size, context_size, etc.) behaves inconsistently depending on how the request is sent.

Using Docker Desktop GUI:
The model starts with the custom configuration (e.g., ubatch-size=2048).

Using curl from inside a container:
The model falls back to default values (ubatch-size=512, n_ctx=4096).

This makes it impossible to reliably control the physical batch size.

Steps to Reproduce

  1. docker-compose.yaml
models:
  embedding:
    model: ai/embeddinggemma:300M-Q8_0
    context_size: 2048
    runtime_flags:
      - "--ubatch-size"
      - "2048"

services:
  curl-tester:
    image: curlimages/curl:8.11.1
    command: ["sh", "-lc", "sleep 1000000"]
    models:
      embedding:
        endpoint_var: EMBEDDING_ENDPOINT
        model_var: EMBEDDING_MODEL

  1. Trigger a request via Docker Desktop GUI
    Observe logs:
[2025-09-23T17:19:35.420131000Z] llama_context: constructing llama_context
[2025-09-23T17:19:35.420164000Z] llama_context: n_seq_max     = 1
[2025-09-23T17:19:35.420178000Z] llama_context: n_ctx         = 2048
[2025-09-23T17:19:35.420190000Z] llama_context: n_ctx_per_seq = 2048
[2025-09-23T17:19:35.420200000Z] llama_context: n_batch       = 2048
[2025-09-23T17:19:35.420210000Z] llama_context: n_ubatch      = 2048
  1. Trigger a request via curl inside the container
docker exec -it tailscale-curl-tester-1 sh -lc '
echo "EMBED:" $EMBEDDING_ENDPOINT $EMBEDDING_MODEL

curl -sS "$EMBEDDING_ENDPOINT/embeddings" \
  -H "Authorization: Bearer dummy" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$EMBEDDING_MODEL\",\"input\":\"hello world\"}" \
  | head -c 300; echo
'

Observe logs:

[2025-09-23T17:21:57.905751000Z] llama_context: constructing llama_context
[2025-09-23T17:21:57.905780000Z] llama_context: n_ctx         = 4096
[2025-09-23T17:21:57.905798000Z] llama_context: n_ctx_per_seq = 4096
[2025-09-23T17:21:57.905807000Z] llama_context: n_batch       = 2048
[2025-09-23T17:21:57.905815000Z] llama_context: n_ubatch      = 512

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions