-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
In WebUI/Conversations/Export it's missing some useful data like context size and usage, for each prompt sent would be nice to have 2 timestamps for each visible events like:
- Prompt one timestamp showing when it was sent and another when everything arrived/finished.
- Inside the answer one when thinking started and one when thinking finished, one when answer started and one when it finished.
- Model parameters used in the conversation.
The main idea is that we can analyze the whole conversation and how much each piece of it took.
In one concrete case I'm testing the same prompt with several models and want to compare then for: total time spent, timestamp when prompt was sent, timestamp/total time to start responding, timestamp/total time thinking, timestamp/total time answering, also would be nice to have the actual parameters used (see bellow the parameters sent when selecting the model).
Model parameters used in the conversation.
"reasoning_format": "auto",
"temperature": 0.3,
"max_tokens": -1,
"dynatemp_range": 0,
"dynatemp_exponent": 1,
"top_k": 30,
"top_p": 0.95,
"min_p": 0.05,
"xtc_probability": 0,
"xtc_threshold": 0.1,
"typ_p": 1,
"repeat_last_n": 64,
"repeat_penalty": 1.1,
"presence_penalty": 0,
"frequency_penalty": 0,
"dry_multiplier": 0,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_penalty_last_n": 8192,
This is an actual exported conversation that doesn't have enough info to allow analyze it as described above (it doesn't have enough timestamps (thinking and content (start/finish)), context size/usage info and model parameters):
{
"conv": {
"id": "d11be1cc-3689-49dc-9373-1fb89668c873",
"name": "who are you",
"lastModified": 1765704878064,
"currNode": "d75f1f84-acb4-4fe7-ae50-29cc9003a476"
},
"messages": [
{
"convId": "d11be1cc-3689-49dc-9373-1fb89668c873",
"role": "user",
"content": "who are you",
"type": "text",
"timestamp": 1765704878337,
"thinking": "",
"toolCalls": "",
"children": [
"e7e48e3e-b135-4795-b462-39105fb52197"
],
"extra": [],
"id": "303c5353-dcff-41a1-a995-3617126a313c",
"parent": "f96bb39c-b5e7-4c87-bb0f-43cbf736c3bd"
},
{
"convId": "d11be1cc-3689-49dc-9373-1fb89668c873",
"type": "text",
"timestamp": 1765704919210,
"role": "assistant",
"content": "Hello! I'm Qwen, a large language model developed by Alibaba Cloud. My primary functions include answering questions, creating text, and engaging in conversations. I can assist with a wide range of tasks such as writing stories, providing explanations, coding, and more. I support multiple languages, including Chinese, English, and many others. How can I assist you today? 😊",
"thinking": "Okay, the user asked \"who are you.\" I need to explain my identity clearly. First, I should mention that I'm Qwen, a large language model developed by Alibaba Cloud. Then, highlight my capabilities like answering questions, creating text, and having conversations. Also, note that I can support multiple languages, including Chinese, English, etc. Make sure the response is friendly and open-ended to invite further interaction.",
"toolCalls": "",
"children": [],
"model": "Qwen3-14B-Q4_K_M-BIGC",
"id": "e7e48e3e-b135-4795-b462-39105fb52197",
"parent": "303c5353-dcff-41a1-a995-3617126a313c",
"timings": {
"cache_n": 0,
"prompt_n": 11,
"prompt_ms": 9.131,
"prompt_per_token_ms": 0.8300909090909091,
"prompt_per_second": 1204.6873288796407,
"predicted_n": 166,
"predicted_ms": 14409.374,
"predicted_per_token_ms": 86.8034578313253,
"predicted_per_second": 11.520278396549358
}
},
{
"convId": "d11be1cc-3689-49dc-9373-1fb89668c873",
"role": "user",
"content": "I'm using this conversation as test to show on a feature request to llama.cpp",
"type": "text",
"timestamp": 1765705012240,
"thinking": "",
"toolCalls": "",
"children": [],
"extra": [],
"id": "2cb2595b-f394-4780-bdbc-b189e6a38a6d",
"parent": "e7e48e3e-b135-4795-b462-39105fb52197"
},
{
"convId": "d11be1cc-3689-49dc-9373-1fb89668c873",
"type": "text",
"role": "assistant",
"content": "Thank you for clarifying! If you're testing this conversation as part of a feature request for **llama.cpp**, I’d be happy to assist. Could you share more details about the specific feature you’re proposing? For example:\n\n- Are you testing how Qwen interacts with llama.cpp (e.g., quantization, GGUF format, or inference speed)?\n- Is there a specific use case or limitation you're trying to address?\n- Do you need help formatting this conversation as part of your request?\n\nLet me know, and I’ll do my best to support your testing or feature proposal! 😊",
"timestamp": 1765705012276,
"thinking": "Okay, the user mentioned they're using this conversation as a test for a feature request related to llama.cpp. First, I need to understand what llama.cpp is. It's an open-source project that runs large language models on CPUs with optimizations like GGUF format and quantization. The user might be testing how Qwen interacts with their system or wants to highlight certain features.\n\nThe user's previous message was \"who are you,\" which I answered by introducing myself as Qwen. Now, they're indicating this conversation is for a feature request in llama.cpp. Maybe they want to show how Qwen handles specific inputs or outputs that could be useful for llama.cpp's development. They might need help with integrating Qwen into their setup or have issues with performance, compatibility, or features.\n\nI should check if there are common use cases where users test models with llama.cpp. Perhaps they're evaluating different models' responses, testing quantization effects, or comparing inference speeds. The user could be looking for support in using Qwen with llama.cpp, but since I'm not part of llama.cpp, I need to clarify that.\n\nI should ask them to elaborate on their feature request. Are they facing specific issues? Do they want to suggest a new feature like better quantization support or improved multilingual handling? Also, confirming if they need assistance in using Qwen with their setup or if there's something else they need help with would be helpful.",
"toolCalls": "",
"children": [],
"model": "Qwen3-14B-Q4_K_M-BIGC",
"id": "d75f1f84-acb4-4fe7-ae50-29cc9003a476",
"parent": "2cb2595b-f394-4780-bdbc-b189e6a38a6d",
"timings": {
"cache_n": 0,
"prompt_n": 113,
"prompt_ms": 120.711,
"prompt_per_token_ms": 1.0682389380530974,
"prompt_per_second": 936.1201547497742,
"predicted_n": 415,
"predicted_ms": 36900.193,
"predicted_per_token_ms": 88.91612771084337,
"predicted_per_second": 11.24655364268691
}
}
]
}
Also when there is more than one model answering the same prompt they are not exported, only the visible one is exported, ideally all of then should be exported at once.
Motivation
Allow better understand the model performance on an concrete task.
Possible Implementation
No response