This document outlines my current AI stack and the tools I use regularly. Given the rapid evolution of the AI landscape, this stack is constantly evolving. I'm always tinkering and experimenting, but these are the components I've found particularly valuable for enhancing my productivity. You can find out more about my AI projects and thoughts on my homepage.
- My AI Stack
These are the foundational elements of my AI stack. They represent the key technologies and infrastructure that support most of my AI-related activities.
I heavily rely on LLMs via API, preferring them over self-hosted options for ease of use and resource management. Using APIs allows me to avoid hardware stress and simplifies deployment.
-
I use OpenRouter to consolidate billing and access a wide variety of APIs, enabling me to select models best suited for specific tasks.
-
Google Flash 2.0 is my primary go-to model due to its fast inference, large context window, and reasonable pricing. While not the best for complex reasoning, its versatility makes it suitable as the backing model for all my Assistant configurations.
-
For code generation that isn't agentic or for debugging purposes, I often turn to Qwen's models. I find the Qwen coder particularly underrated.
-
Cohere is useful for instructional text-based tasks.
-
While I sometimes use OpenAI, especially Sonnet 3.7 for agented code generation, I've found its recent performance to be inconsistent.
I've tested numerous AI tool frontends, and Open Web UI stands out as the most impressive. I share some of my configurations on Open Web UI. I recommend starting with a PostgreSQL database for long-term use, rather than SQLite.
While the container defaults to Chroma DB, you can configure it to use Milvus, Qdrant, or other options. Initially, self-hosting was for experimentation, but when it became robust enough to replace commercial tools, I re-architected for long-term stability, emphasizing careful component selection from the outset.
Transitioning to speech-to-text has been transformative! After unsatisfactory experiences a decade ago, Whisper has revolutionized reliability and made it good enough for everyday use.
I use Whisper AI as a Chrome extension for speech-to-text and use it for many hours per day.
For Android, the open source Futo Keyboard project offers promise, but it depends on local hardware.
While I recognize the use case, I prefer not to run speech-to-text or most AI models locally. On my Linux desktop, I use generative AI tools to create my own notepads for Whisper via the API.
I am developing a personal managed context data store for creating personalized AI experiences. This is a long-term project and my approach is likely to change over time. I'm using a multi-agent workflow to proactively generate contextual data. You can see some of my related projects here:
- Agentic-Context-Development-Interview-Demo
- Personal-RAG-Agent-Workflow
- My-LLM-Context-Repo-Public
- Personal-Context-Repo-Idea
The project involves creating markdown files based on interviews detailing aspects of my life. I've also used the inverse approach of putting non-contextual data through an LLM pipeline to isolate context data. These workflows can be implemented with complex agent systems like Crew AI or by creating assistants using system prompts.
For vector storage, I avoid OpenAI assistants to prevent vendor lock-in and instead use Qdrant to decouple my personal context data from other parts of the project.
Storing AI outputs more robustly doesn't require specialized solutions; regular databases suffice.
MongoDB and PostgreSQL are my preferred databases. PostgreSQL is especially beneficial, as it can easily be extended with PGVector.
These tools enhance and extend the capabilities of the core components, enabling more complex workflows and creative applications.
I've explored the field of AI agents and assistants, noting that many interesting projects lack well-developed frontends. You can explore some of my AI assistants in my AI Assistants Library.
I'm a strong advocate for simple system prompt-based agents and have open-sourced over 600 system prompts since discovering AI in 2024. I currently use these in Open Web UI, sharing my library with the Open Web UI community. I've also tested Dify AI, but found it less effective with such a large agent network.
While having over 600 assistants may seem excessive, it's manageable when each assistant is highly focused on a small, distinct task. For example, I have assistants for changing the persona of text, formalizing it, informalizing it, and other common writing tasks. My current focus is on orchestration and tool usage.
Here's a look at some other generative AI tools I've been playing with:
I use Leonardo AI for text-to-image generation. I appreciate the diversity of models and their configurable parameters.
While I haven't explored text-to-video as extensively, I use Runway ML for creating animations from frames.
My main interest in AI systems lies in addressing the challenge of making this rapidly growing technology more effective and versatile through tool use, workflow management, and orchestration.
I use N8N to provision and orchestrate agents. I am trying out different stack combinations, prioritizing fewer components. I also like the idea of pipelines and tools within Open Web UI to enable actions on external services. I believe that we will see stack consolidation this year.
Langflow provides a user-friendly interface for visually building complex workflows with language models, making it easier to prototype and experiment with different LLM configurations.
This section covers the tools and workflows I use that leverage agentic AI principles, including my choice of IDEs and how I integrate AI with my computer usage.
I currently subscribe to Windsurf, valuing its integrated experience for agent-driven code generation, despite some recent performance issues.
I also use Aider, especially for single-script projects where precise context specification is advantageous.
I use OpenSUSE Linux as my daily desktop, which influences my choice of tools.
I've found Open Interpreter impressive for running LLMs directly within the terminal and see significant potential in this project. It requires careful provisioning for debugging and working on the computer, but it's worth exploring.
This repository includes a docker-compose.yaml file that encapsulates my AI stack. This setup allows for easy deployment and management of the various components.
Services included:
Key Components:
- OpenWebUI: My primary frontend for interacting with LLMs.
- PostgreSQL: The main database for storing application data.
- Qdrant: A vector database essential for semantic search and RAG applications.
- Redis: Used for caching and performance optimization.
- Langflow: Facilitates workflow management for language models.
- Linkwarden: A bookmark and web content manager for research and reference.
- N8N: My chosen workflow automation platform.
- Unstructured: For extracting content from a variety of file formats.
In addition to these core services, the Docker Compose configuration includes:
- Monitoring and Backup: Glances for system monitoring and Duplicati for backups, ensuring a robust and maintainable system.
This implementation demonstrates a practical deployment of the tools and services, including necessary environment variables, volume mounts, networking configurations, and health checks.
I leverage specialized APIs in conjunction with LLMs to enhance specific tasks.
- Tavily: This search API provides relevant, up-to-date information, making it ideal for RAG applications and ensuring LLMs have access to current knowledge.
- Sonar by Perplexity: Perplexity's API delivers powerful search capabilities with built-in summarization and information synthesis, particularly effective for research and gathering comprehensive information on specific topics.
These APIs complement the LLM capabilities, enabling more robust AI applications with access to real-time data.
