Skip to content

Codebase indexing doesn't work with local Ollama #5517

@vshvedov

Description

@vshvedov

App Version

3.23.3

API Provider

Ollama

Model Used

nomic-embed-text:latest

Roo Code Task Links (Optional)

No response

🔁 Steps to Reproduce

  1. Setup Codebase indexing via local Ollama (http://localhost:11434) and dockerized Qdrant (http://localhost:6333) using nomic-embed-text:latest
  2. Start indexing
  3. Roo crashes after 3-5 restarts

💥 Outcome Summary

Expected: local Ollama-based (macOS, M1 Max, 0.9.6) Codebase indexing to work

📄 Relevant Logs or Errors (Optional)

~ OLLAMA_FLASH_ATTENTION="1" OLLAMA_KV_CACHE_TYPE="q8_0" /opt/homebrew/opt/ollama/bin/ollama serve
time=2025-07-09T09:55:54.750-07:00 level=INFO source=routes.go:1235 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/vsh/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-07-09T09:55:54.751-07:00 level=INFO source=images.go:476 msg="total blobs: 7"
time=2025-07-09T09:55:54.751-07:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-09T09:55:54.752-07:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.6)"
time=2025-07-09T09:55:54.795-07:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="48.0 GiB" available="48.0 GiB"
[GIN] 2025/07/09 - 09:56:12 | 200 |    2.395167ms |       127.0.0.1 | GET      "/api/tags"
time=2025-07-09T09:56:12.659-07:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/vsh/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 gpu=0 parallel=1 available=51539607552 required="864.9 MiB"
time=2025-07-09T09:56:12.659-07:00 level=INFO source=server.go:135 msg="system memory" total="64.0 GiB" free="32.9 GiB" free_swap="0 B"
time=2025-07-09T09:56:12.659-07:00 level=WARN source=server.go:145 msg="requested context size too large for model" num_ctx=8192 num_parallel=1 n_ctx_train=2048
time=2025-07-09T09:56:12.660-07:00 level=INFO source=server.go:175 msg=offload library=metal layers.requested=-1 layers.model=13 layers.offload=13 layers.split="" memory.available="[48.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="809.4 MiB" memory.required.partial="809.4 MiB" memory.required.kv="6.0 MiB" memory.required.allocations="[809.4 MiB]" memory.weights.total="260.9 MiB" memory.weights.repeating="216.1 MiB" memory.weights.nonrepeating="44.7 MiB" memory.graph.full="12.0 MiB" memory.graph.partial="12.0 MiB"
time=2025-07-09T09:56:12.660-07:00 level=WARN source=server.go:211 msg="flash attention enabled but not supported by model"
time=2025-07-09T09:56:12.660-07:00 level=WARN source=server.go:229 msg="quantized kv cache requested but flash attention disabled" type=q8_0
llama_model_load_from_file_impl: using device Metal (Apple M1 Max) - 49151 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/vsh/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
time=2025-07-09T09:56:12.684-07:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/opt/homebrew/Cellar/ollama/0.9.6/bin/ollama runner --model /Users/vsh/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 --ctx-size 2048 --batch-size 512 --n-gpu-layers 13 --threads 8 --parallel 1 --port 58487"
time=2025-07-09T09:56:12.687-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-09T09:56:12.687-07:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-07-09T09:56:12.687-07:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-09T09:56:12.700-07:00 level=INFO source=runner.go:815 msg="starting go runner"
time=2025-07-09T09:56:12.700-07:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-07-09T09:56:12.701-07:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:58487"
llama_model_load_from_file_impl: using device Metal (Apple M1 Max) - 49151 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 112 tensors from /Users/vsh/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = nomic-bert
llama_model_loader: - kv   1:                               general.name str              = nomic-embed-text-v1.5
llama_model_loader: - kv   2:                     nomic-bert.block_count u32              = 12
llama_model_loader: - kv   3:                  nomic-bert.context_length u32              = 2048
llama_model_loader: - kv   4:                nomic-bert.embedding_length u32              = 768
llama_model_loader: - kv   5:             nomic-bert.feed_forward_length u32              = 3072
llama_model_loader: - kv   6:            nomic-bert.attention.head_count u32              = 12
llama_model_loader: - kv   7:    nomic-bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                nomic-bert.attention.causal bool             = false
llama_model_loader: - kv  10:                    nomic-bert.pooling_type u32              = 1
llama_model_loader: - kv  11:                  nomic-bert.rope.freq_base f32              = 1000.000000
llama_model_loader: - kv  12:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  20:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  22:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  23:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:   51 tensors
llama_model_loader: - type  f16:   61 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 260.86 MiB (16.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = nomic-bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 2048
print_info: n_embd           = 768
print_info: n_layer          = 12
print_info: n_head           = 12
print_info: n_head_kv        = 12
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 768
print_info: n_embd_v_gqa     = 768
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 2048
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 137M
print_info: model params     = 136.73 M
print_info: general.name     = nomic-embed-text-v1.5
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 12 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 13/13 layers to GPU
load_tensors:   CPU_Mapped model buffer size =    44.72 MiB
load_tensors: Metal_Mapped model buffer size =   216.15 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = 0
llama_context: freq_base     = 1000.0
llama_context: freq_scale    = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_load_library: using embedded metal library
ggml_metal_init: GPU name:   Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction   = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has residency sets    = true
ggml_metal_init: has bfloat            = true
ggml_metal_init: use bfloat            = false
ggml_metal_init: hasUnifiedMemory      = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 51539.61 MB
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h192          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk192_hv128   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_hk576_hv512   (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h96       (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h192      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk192_hv128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_hk576_hv512 (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
llama_context:        CPU  output buffer size =     0.00 MiB
time=2025-07-09T09:56:12.938-07:00 level=INFO source=server.go:637 msg="llama runner started in 0.25 seconds"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:56:12 | 200 |   339.41875ms |       127.0.0.1 | POST     "/api/embed"
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.627-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 l[GIN] 2025/07/09 - 09:56:35 | 500 |  967.271667ms |       127.0.0.1 | POST     "/api/embed"
evel=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.627-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:56:35.626-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
[GIN] 2025/07/09 - 09:56:35 | 500 |  764.654042ms |       127.0.0.1 | POST     "/api/embed"
[GIN] 2025/07/09 - 09:56:37 | 200 |     832.208µs |       127.0.0.1 | GET      "/api/tags"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:56:37 | 200 |   42.733583ms |       127.0.0.1 | POST     "/api/embed"
[GIN] 2025/07/09 - 09:57:01 | 200 |    4.899958ms |       127.0.0.1 | GET      "/api/tags"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:57:01 | 200 |   24.733375ms |       127.0.0.1 | POST     "/api/embed"
[GIN] 2025/07/09 - 09:57:52 | 200 |     2.16325ms |       127.0.0.1 | GET      "/api/tags"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:57:52 | 200 |   35.151709ms |       127.0.0.1 | POST     "/api/embed"
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
decode: cannot decode batches with this context (use llama_encode() instead)
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
time=2025-07-09T09:58:12.517-07:00 level=INFO source=server.go:907 msg="aborting embedding request due to client closing the connection"
[GIN] 2025/07/09 - 09:58:12 | 500 |  328.227667ms |       127.0.0.1 | POST     "/api/embed"
[GIN] 2025/07/09 - 09:58:14 | 200 |    1.244583ms |       127.0.0.1 | GET      "/api/tags"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:58:14 | 200 |   35.989667ms |       127.0.0.1 | POST     "/api/embed"
[GIN] 2025/07/09 - 09:58:36 | 200 |    2.048125ms |       127.0.0.1 | GET      "/api/tags"
decode: cannot decode batches with this context (use llama_encode() instead)
[GIN] 2025/07/09 - 09:58:36 | 200 |   27.631417ms |       127.0.0.1 | POST     "/api/embed"

Metadata

Metadata

Assignees

Labels

Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions