A minimalistic AI-powered search assistant that combines SearXNG (local web search) with Ollama (local LLM) to answer your questions using real web results. It is privacy-focused with chat history saved locally in the browser. Searxng can be configured to use VPN and aggregate search results in various ways. And since everything is running locally on your privately hosted instance, there is no tracking or training of your data of any kind.
- Node.js ≥ 18
- SearXNG running on
http://localhost:8080- If searXNG cannot be reached, try add the following to nftables in
archlinux:
If you are using a VPN (e.g. MullVad VPN or using MullVad Browser), you need to set the priority higher such as -1
chain forward { type filter hook forward priority filter; policy drop; ip saddr 172.18.0.0/16 accept ip daddr 172.18.0.0/16 ct state established,related accept }Sometimes your firewall is blocking docker network access, so check ufw if applicable.chain forward { type filter hook forward priority -1; policy drop; ip saddr 172.18.0.0/16 accept ip daddr 172.18.0.0/16 ct state established,related accept }
- If searXNG cannot be reached, try add the following to nftables in
archlinux:
- Ollama running on
http://localhost:11434with modelmistral:7bpulled
cd into searxng folder and then setup the .env file based on .env.example and
then run docker compose up -d.
download ollama and pull your favorite model and ollama serve if ollama isn't
already running as a service in your system, in which case you can check
sudo systemctl status ollama.
- For a 16GB laptop with something like RTX 2060 for GPU, you can totally run
mistral:7b, its very lightweight and efficient, anything above 7b will struggle on a 16GB laptop; if you have 96GB RAM with RTX 4090 GPU, you can rungpt-oss:20b,qwen3.5:35bordeepseek-r1:32b; consider running quantized versions in resource constrained environments. Also, llama3.2:1b can also run on CPU instead of GPU if you are running on a server without a GPU, it can be done.
npm install
npm run dev- You type a question in the input box and hit Search.
- The app queries your local SearXNG instance with search queries formulated by a lightweight AI model and then retrieves and merges the top 8 results per each queries (max 3 queries).
- The results are bundled into a prompt and sent to Ollama (
mistral:7b) via streaming. - The answer streams in token-by-token, with sources listed below.
browser → Vite dev-server proxy → SearXNG :8080
→ Ollama :11434
Vite's proxy avoids CORS issues — all requests go through /searxng/* and
/ollama/*.
npm run build # outputs to dist/
npm run previewFor production you'll need a reverse proxy (nginx / Caddy) to forward /searxng
and /ollama paths.
Main search input and chat interface while a reply streams in with model and timer for generating answers.
Answer with numbered web citations shown under the streamed response with link to source and which search engine they came from.


