guide : using the new WebUI of llama.cpp #16938
Replies: 48 comments 154 replies
-
|
Does anyone have a neat example to share for constrained output using the custom JSON option of the WebUI? Something that would be suitable for demonstration purposes. |
Beta Was this translation helpful? Give feedback.
-
|
I tried this one : Inside Developer / Custom JSON : Prompt "you feel good ?" Model answer (on SvelteUI) : |
Beta Was this translation helpful? Give feedback.
-
|
Not sure if this is a neat example, but something easy you can do with vision LLMs is extract data from images in a structured way. Add this to Developer/Custom JSON Details{
"json_schema": {
"$defs": {
"Address": {
"properties": {
"street": {
"title": "Street",
"type": "string"
},
"city": {
"title": "City",
"type": "string"
},
"state": {
"title": "State",
"type": "string"
},
"zip_code": {
"title": "Zip Code",
"type": "string"
}
},
"required": [
"street",
"city",
"state",
"zip_code"
],
"title": "Address",
"type": "object"
},
"BillTo": {
"properties": {
"company_name": {
"title": "Company Name",
"type": "string"
},
"address": {
"$ref": "#/$defs/Address"
},
"attention": {
"title": "Attention",
"type": "string"
}
},
"required": [
"company_name",
"address",
"attention"
],
"title": "BillTo",
"type": "object"
},
"Company": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"address": {
"$ref": "#/$defs/Address"
},
"phone": {
"title": "Phone",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
}
},
"required": [
"name",
"address",
"phone",
"email"
],
"title": "Company",
"type": "object"
},
"InvoiceLine": {
"properties": {
"description": {
"title": "Description",
"type": "string"
},
"quantity": {
"title": "Quantity",
"type": "integer"
},
"rate": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Rate"
},
"amount": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Amount"
}
},
"required": [
"description",
"quantity",
"rate",
"amount"
],
"title": "InvoiceLine",
"type": "object"
},
"PaymentMethods": {
"properties": {
"bank_account": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Bank Account"
},
"routing_number": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Routing Number"
},
"check_payable_to": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Check Payable To"
}
},
"title": "PaymentMethods",
"type": "object"
}
},
"properties": {
"invoice_number": {
"title": "Invoice Number",
"type": "string"
},
"invoice_date": {
"format": "date",
"title": "Invoice Date",
"type": "string"
},
"due_date": {
"format": "date",
"title": "Due Date",
"type": "string"
},
"company": {
"$ref": "#/$defs/Company"
},
"bill_to": {
"$ref": "#/$defs/BillTo"
},
"lines": {
"items": {
"$ref": "#/$defs/InvoiceLine"
},
"title": "Lines",
"type": "array"
},
"subtotal": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Subtotal"
},
"tax_rate": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Tax Rate"
},
"tax_amount": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Tax Amount"
},
"total": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Total"
},
"payment_terms": {
"title": "Payment Terms",
"type": "string"
},
"payment_methods": {
"$ref": "#/$defs/PaymentMethods"
},
"notes": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Notes"
}
},
"required": [
"invoice_number",
"invoice_date",
"due_date",
"company",
"bill_to",
"lines",
"subtotal",
"tax_rate",
"tax_amount",
"total",
"payment_terms",
"payment_methods"
],
"title": "Invoice",
"type": "object"
}
}and with a model that supports vision (Qwen3-VL-8B should work), paste this image: And it will just output the invoice data without requiring any instructions: Details{
"invoice_number": "INV-2024-0847",
"invoice_date": "2025-07-29",
"due_date": "2025-08-28",
"company": {
"name": "Acme Corporation",
"address": {
"street": "123 Business Street",
"city": "New York",
"state": "NY",
"zip_code": "10001"
},
"phone": "(555) 123-4567",
"email": "[email protected]"
},
"bill_to": {
"company_name": "Tech Solutions Inc.",
"address": {
"street": "456 Innovation Drive",
"city": "San Francisco",
"state": "CA",
"zip_code": "94105"
},
"attention": "John Smith"
},
"lines": [
{
"description": "Web Development Services",
"quantity": 40,
"rate": 150.00,
"amount": 6000.00
},
{
"description": "UI/UX Design",
"quantity": 20,
"rate": 125.00,
"amount": 2500.00
},
{
"description": "Database Setup",
"quantity": 8,
"rate": 100.00,
"amount": 800.00
},
{
"description": "Monthly Hosting",
"quantity": 1,
"rate": 250.00,
"amount": 250.00
}
],
"subtotal": 9550.00,
"tax_rate": 8.5,
"tax_amount": 811.75,
"total": 10361.75,
"payment_terms": "Net 30 days. 1.5% late fee per month on overdue balances.",
"payment_methods": {
"bank_account": "Account #123456789, Routing #987654321",
"check_payable_to": "Acme Corporation"
},
"notes": "Thank you for your business!"
}One problem with this is that the output is not wrapped in json fenced markdown blocks so you get no syntax highlighting. This could be improved if the web UI had native support for passing a JSON schema and when enabled displayed the output in a specialized JSON viewer, such as this one |
Beta Was this translation helpful? Give feedback.
-
|
I love the look of this. Could you add a "Continue Assistant Response" kind of button? Helps to steer the AI toward a specific formatting you want at the beginning of a conversation if you could edit its response then have it continue output. |
Beta Was this translation helpful? Give feedback.
-
|
How to enable Parallel conversations? Do I need to use a specific param when launching the server? |
Beta Was this translation helpful? Give feedback.
-
|
Congratulations guys this looks absolutely amazing!! :D Can't wait to use it |
Beta Was this translation helpful? Give feedback.
-
|
🚀🚀🚀 |
Beta Was this translation helpful? Give feedback.
-
|
Excellent work! It strikes the right balance between functionality, a simple user experience, and performance. Admittedly, this is outside the scope of the project, but I would appreciate the option of deploying this interface in standalone mode, separate from llama.cpp, with third-party OpenAI API support. |
Beta Was this translation helpful? Give feedback.
-
|
implement more Agents for the GUI, like mini-swe-agent and/or make a GUI for trae https://github.com/bytedance/trae-agent |
Beta Was this translation helpful? Give feedback.
-
|
Is there an option to add a search URL or something to search the web? |
Beta Was this translation helpful? Give feedback.
-
|
Kudos guys this rocks! |
Beta Was this translation helpful? Give feedback.
-
|
I created a step-by-step installation and testing video for this Llama.cpp WebUI: https://youtu.be/1H1gx2A9cww?si=bJwf8-QcVSCutelf Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
Error: "the request exceeds the available context size, try increasing it" So I can only use a chat as long as it's in context size? context window shifting would be really nice. So e.g. on 16k context one can write on and on and the ai always knows the newest context (all earliest messages that are apart from context_size - max_output (e.g. 2048) are deleted in kv-cache). I tried using Btw. I like that llama.cpp has now its ui for chats (switching from koboldcpp). I really like llama.cpp for its vram efficiency (using cuda on nvidia customer card). |
Beta Was this translation helpful? Give feedback.
-
|
where I can get whole list of commands? |
Beta Was this translation helpful? Give feedback.
-
|
Off the scale - thank you for all you do! |
Beta Was this translation helpful? Give feedback.
-
|
Still think we need more 'defaults' automagically determined from the
model itself, eg .gguf model type.
Of course, llama-server shares cli options with other tools, and it's
already getting too large of an ever changing CLI options..
Rather than thinking of this as an .ini, rather we should consider this
a .loc override, for what the community agrees the recommended values
are, or the model maintainers, via the information in the .gguf headers.
Of course, certain systems may be 'constrained', so need lower values,
or for experimental purposes, or because of LoRa layers at startup etc..
Suggest that we consider a more 'standard' location for config's that
override .gguf header information.
…On 2025-12-10 08:35, Pascal wrote:
./llama-server --port 8082 -ngl 999 -ctk q8_0 -ctv q8_0 -fa on --mlock
-np 4 -kvu --jinja --models-max 1 --models-preset config.ini
Easy .ini format :
|[MoE-Qwen3-Coder-30B-A3B-Instruct] m =
/path/to/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-Q6_K.gguf temp = 0.7 top-p = 0.8 top-k = 20 min-p = 0 ctx-size = 131072 [MyModel] m = /my/other.gguf ... |
The command-line arguments are inherited, and you're overriding them
with custom configuration!
—
Reply to this email directly, view it on GitHub
<#16938 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK24I7XL25KU5GLPHOKUVYL4BBDWPAVCNFSM6AAAAACK5APP4OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMRSGIYDKMA>.
You are receiving this because you commented.Message ID:
***@***.***>
--
"Catch the Magic of Linux..."
------------------------------------------------------------------------
Michael Peddemors, President/CEO LinuxMagic Inc.
Visit us at http://www.linuxmagic.com @linuxmagic
A Wizard IT Company - For More Info http://www.wizard.ca
"LinuxMagic" a Reg. TradeMark of Wizard Tower TechnoServices Ltd.
------------------------------------------------------------------------
604-682-0300 Beautiful British Columbia, Canada
|
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
Could there be an option added to adjust the width of the main chat DIV on desktop? on a large monitor there's a lot of wasted space to the left and right of the chat DIV. thanks! |
Beta Was this translation helpful? Give feedback.
-
|
I just packaged llama.cpp with this repo: https://github.com/oliverbob/ginto.ai for Agentic UI workflows. Built everything ready for It is not perfect yet, but the community might find it useful:
I wish I have more time to create more features:
Tool-calling works best, especially for GPT-OSS / QWEN/Deepseek variants. Very excellent with Groq and Cerebras API.
I'll be focusing on refining MCP client/server and todo listing and model planning enhancements very soon.
|
Beta Was this translation helpful? Give feedback.
-
|
any i18n support for webUI in future? |
Beta Was this translation helpful? Give feedback.
-
|
how can we use tts gguf model with webui? |
Beta Was this translation helpful? Give feedback.
-
|
is it possible to allow model to be able to see uploaded file real filename? So for example in image identification, we can refer to specific image using the filename |
Beta Was this translation helpful? Give feedback.
-
|
Wondering if there are near plans of adding some native tools in the webUI? ie. web_search it would be nice to have my LLM access URLs in the local UI interface :) |
Beta Was this translation helpful? Give feedback.
-
|
Hi. Is there any basic authentication support in the WebUI, e.g., user name and password? |
Beta Was this translation helpful? Give feedback.
-
|
What are the best tools for integrating conversational memory into the webUI? Basically, I want something that will be able to remember past conversations, user preferences, etc. Are there any memory systems that have good integrations with the webUI? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I added the prefix to my proxy server. But WebUI still accesses the path without add some prefix to it. I have tried |
Beta Was this translation helpful? Give feedback.
-
|
Hello, I am trying to use the though I do not see my system prompt in the general settings of the llama.cpp WebUI. Am I configuring this wrongly? |
Beta Was this translation helpful? Give feedback.
-
|
Hello, will you support multi system prompt presets and folder based session management? |
Beta Was this translation helpful? Give feedback.
-
|
Is it possible to pass the current date via the system prompt in the web UI? I have seen something like "{CURRENT_DATE}", but it does not work. Using an MCP tool just to get the date seems a bit excessive... |
Beta Was this translation helpful? Give feedback.
-
|
What specific version(s) is this tested on?
Assuming you are setting this in the llama-server web GUI?
(Settings->System Message)..
Are there any other caveats? Eg specific LLM's or start up options?
Trying in 'router' mode, and the LLM's that respect the prompt
explicitly report .. "Today's date is {{current_date}}." ;)
…On 2026-04-12 02:23, Oliver Bob Lagumen wrote:
The llama.cpp server web UI does support date/time template variables —
but the syntax is different from what you tried. The correct variables are:
* |current_date| → replaced with |YYYY-MM-DD| (UTC)
* |current_time| → replaced with |HH:MM:SS| (UTC)
* |current_timestamp| → replaced with a full UTC timestamp like
|2025-05-08 22:19:33| Substack
<https://simonw.substack.com/p/trying-out-llamacpps-new-vision-support>
So in your system prompt, use something like:
|Today's date is {{current_date}}. You are a helpful assistant... |
or just the bare variable without braces — the exact delimiter syntax
can vary by build, so try both |current_date| and |{{current_date}}| if
one doesn't work. The |{CURRENT_DATE}| you saw is likely the Open WebUI
convention (which uses |{{CURRENT_DATE}}|), not llama.cpp's native web UI.
One caveat: these use UTC, so if your local timezone matters, you may
want to hardcode the offset or just manually note your timezone in the
system prompt. Substack
<https://simonw.substack.com/p/trying-out-llamacpps-new-vision-support>
—
Reply to this email directly, view it on GitHub
<#16938 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK24I7VPZ2NMLXROSGG7SIL4VNOBLAVCNFSM6AAAAACK5APP4OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMNJTGMYTKMY>.
You are receiving this because you commented.Message ID:
***@***.***>
--
"Catch the Magic of Linux..."
------------------------------------------------------------------------
Michael Peddemors, President/CEO LinuxMagic Inc.
Visit us at http://www.linuxmagic.com @linuxmagic
A Wizard IT Company - For More Info http://www.wizard.ca
"LinuxMagic" a Registered TradeMark of Wizard Tower TechnoServices Ltd.
------------------------------------------------------------------------
604-682-0300 Beautiful British Columbia, Canada
|
Beta Was this translation helpful? Give feedback.










Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This guide highlights the key features of the new SvelteKit-based WebUI of
llama.cpp.The new WebUI in combination with the advanced backend capabilities of the llama-server delivers the ultimate local AI chat experience. A few characteristics that set this project ahead of the alternatives:
Getting started
Get llama.cpp: Install | Download | Build
Start the
llama-servertool:# sample server running gpt-oss-20b at http://127.0.0.1:8033 llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 0 --host 127.0.0.1 --port 8033Open and start using the WebUI in your browser:
Tip
For a simple, GUI-based setup of
llama.cppon Mac, try the new LlamaBarn applicationFeatures
The new WebUI is packed with many useful features to enhance your local AI experience. Following are a few examples.
Text document processing
Add multiple text files from disk or from the clipboard to the context of your conversation:
PDF document processing
Attach one or multiple PDFs to your conversation. By default, the contents of the PDFs will be converted to RAW text, excluding any visuals.
Optionally, the WebUI can process the PDFs as images when the AI model supports it.
Image inputs
When the selected AI model has vision input capabilities, the WebUI allows you to insert images into your conversation:
Images can be inserted in addition to a textual context:
Conversation branching
Branch from previous points of the conversation by editing or regenerating a message:
webui-edits-0-thumb-small.mp4
Parallel conversations
Run multiple chat conversations at the same time:
webui-parallel-0-thumb-small.mp4
Parallel image processing is also supported:
webui-parallel-1-thumb-small.mp4
Override default sampling parameters
Start the
llama-serverusing a set of default sampling parameters:# set the default Top-K to be 5 and the default Temperature to be 0.80 llama-server -hf ggml-org/gpt-oss-120b-GGUF --jinja -c 0 --port 8033 --alias gpt-oss-120b --top-k 5 --temp 0.80These parameters will now become the default values in the WebUI settings:
webui-parameters-0-thumb-small.mp4
More info: #16515
Render math expressions
The WebUI can render mathematical expressions:
Input via URL parameters
The WebUI supports passing input through the URL parameters:
webui-url-input-0-thumb-small.mp4
HTML/JS preview
The WebUI supports inline rendering of generated HTML/JS code:
webui-js-0-thumb-small.mp4
More info: #16757
Constrained generation
Specify a custom JSON schema to constrain the generated output to a specific format. As an example, here is generic invoice data extraction from multiple documents:
webui-constrained-0-thumb-small.mp4
Import/Export
Use the Import/Export options to manage your private conversations directly through the WebUI:
Efficient SSM context management
The context management and prefix caching of State Space Models (SSMs, e.g. Mamba) can be tricky.
llama-serversolves this problem efficiently for one or multiple users with minimum reprocessing.Here is an example of context branching using a hybrid LLM:
webui-ssm-0-thumb-small.mp4
Mobile compatibility
The new WebUI is mobile friendly:
Sample commands
A few
llama-servercommands used for the examples above:Acknowledgements
Beta Was this translation helpful? Give feedback.
All reactions