bandarra.me

bandarra.me 2026-06-08T07:05:33Z https://bandarrame.me André Cipriani Bandarra andreban@gmail.com Bringing the Agent Loop to the Web 2026-05-06T16:19:30Z https://bandarra.me/posts/bringing-the-agent-loop-to-the-web <img src="/images/demystifying-ai-agents-the-loop-in-the-hero-1.jpg" alt="" /> In <a href="https://bandarra.me/posts/demystifying-ai-agents">Demystifying AI Agents: Learning the Mechanics with Rust</a>, we saw that an AI agent is just a <code>while</code> loop wrapping a stateless LLM. It asks the model to act, runs a tool, updates history, and repeats. No magic, just plumbing. <h2>Introduction</h2> My PM colleague mentioned that most of his day-to-day work had moved from the browser into an IDE, with AI supporting a lot of it. I noticed the same thing when writing for this blog. I'd built a custom admin interface for a better writing experience, but once I started using AI agents to help with posts, I found myself moving back to an IDE. That got me thinking. IDEs have deep integration with the device. They can read files, run shell commands, and know what you're looking at. Web agents don't have any of that today. But for something like writing a blog post, I don't see a fundamental reason why the web can't. The gap isn't capability; it's where the agent loop runs. Most agent frameworks assume agents belong on the server, treating the browser as a "dumb terminal" for sending prompts and displaying text. The result is often a chat panel bolted onto a product, not built into it. It can answer questions, but it can't interact with the application or react to what's on screen. If the agent's loop runs on a remote server, it lacks awareness of the user's browser environment. Reading the text the user selected, checking local storage, or reading UI state requires cumbersome piping through websockets or polling, fighting the environment rather than using what's already there. What if the browser is the agent? Consider a <a href="https://bandarra.me/apps/agent-text-editor/">browser-based text editor agent</a>. It reads highlighted text, renders surgical edits as diffs, and pauses for user approval. <img src="/images/agent-text-editor.png" alt="" /> That kind of integration is far more natural when the loop runs directly in the client. The tool executes in the browser and the loop pauses until the user responds. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 750 430" width="100%" height="430"> <defs> <filter id="shadow-seq" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <linearGradient id="bg-seq" x1="0" y1="0" x2="0" y2="1"> <stop offset="0%" stop-color="#f8fafc" /> <stop offset="100%" stop-color="#f1f5f9" /> </linearGradient> <marker id="arrow-seq" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> <marker id="arrow-seq-green" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#10b981" /> </marker> </defs>  <rect x="0" y="0" width="750" height="430" rx="12" fill="url(#bg-seq)" stroke="#cbd5e1" stroke-width="2" />  <rect x="20" y="15" width="160" height="58" rx="8" fill="#ffffff" stroke="#94a3b8" stroke-width="2" filter="url(#shadow-seq)" /> <text x="100" y="42" font-family="system-ui, -apple-system, sans-serif" font-size="13" font-weight="bold" fill="#475569" text-anchor="middle">UI / Application</text> <text x="100" y="60" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#94a3b8" text-anchor="middle">in the browser</text> <rect x="285" y="15" width="180" height="58" rx="8" fill="#ffffff" stroke="#1a73e8" stroke-width="2" filter="url(#shadow-seq)" /> <text x="375" y="42" font-family="system-ui, -apple-system, sans-serif" font-size="13" font-weight="bold" fill="#1e293b" text-anchor="middle">Agent Loop</text> <text x="375" y="60" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#94a3b8" text-anchor="middle">in the browser</text> <rect x="555" y="15" width="160" height="58" rx="8" fill="#ffffff" stroke="#8b5cf6" stroke-width="2" filter="url(#shadow-seq)" /> <text x="635" y="42" font-family="system-ui, -apple-system, sans-serif" font-size="13" font-weight="bold" fill="#1e293b" text-anchor="middle">Cloud LLM</text> <text x="635" y="60" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#94a3b8" text-anchor="middle">remote</text>  <line x1="100" y1="73" x2="100" y2="415" stroke="#cbd5e1" stroke-width="1.5" stroke-dasharray="4,4" /> <line x1="375" y1="73" x2="375" y2="415" stroke="#cbd5e1" stroke-width="1.5" stroke-dasharray="4,4" /> <line x1="635" y1="73" x2="635" y2="415" stroke="#cbd5e1" stroke-width="1.5" stroke-dasharray="4,4" />  <path d="M 163 110 L 278 110" stroke="#10b981" stroke-width="2" fill="none" marker-end="url(#arrow-seq-green)" /> <text x="220" y="102" font-family="system-ui, -apple-system, sans-serif" font-size="11" font-weight="bold" fill="#059669" text-anchor="middle">1. reads selection</text> <text x="220" y="124" font-family="system-ui, -apple-system, sans-serif" font-size="10" fill="#059669" text-anchor="middle">direct — no network</text>  <path d="M 465 155 L 548 155" stroke="#94a3b8" stroke-width="2" stroke-dasharray="5,4" fill="none" marker-end="url(#arrow-seq)" /> <text x="506" y="147" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#64748b" text-anchor="middle">2. send history</text>  <path d="M 548 188 L 465 188" stroke="#94a3b8" stroke-width="2" stroke-dasharray="5,4" fill="none" marker-end="url(#arrow-seq)" /> <text x="506" y="180" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#64748b" text-anchor="middle">3. call render_diff tool</text>  <path d="M 278 230 L 163 230" stroke="#10b981" stroke-width="2" fill="none" marker-end="url(#arrow-seq-green)" /> <text x="220" y="222" font-family="system-ui, -apple-system, sans-serif" font-size="11" font-weight="bold" fill="#059669" text-anchor="middle">4. execute tool → renders diff</text> <text x="220" y="244" font-family="system-ui, -apple-system, sans-serif" font-size="10" fill="#059669" text-anchor="middle">direct — no network</text>  <rect x="330" y="258" width="90" height="32" rx="6" fill="#ffffff" stroke="#f97316" stroke-width="2" filter="url(#shadow-seq)" /> <text x="375" y="272" font-family="system-ui, -apple-system, sans-serif" font-size="11" font-weight="bold" fill="#ea580c" text-anchor="middle">5. paused</text> <text x="375" y="285" font-family="system-ui, -apple-system, sans-serif" font-size="10" fill="#ea580c" text-anchor="middle">awaiting input</text>  <path d="M 163 325 L 322 325" stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrow-seq)" /> <text x="220" y="317" font-family="system-ui, -apple-system, sans-serif" font-size="11" fill="#64748b" text-anchor="middle">6. accept / reject</text> </svg> <h2>The case for client-side agents</h2> Server-bound agents have one core limitation: they can't see the client-side state and synchronizing changes to the UI from tool calls is cumbersome. A server-side agent has to wait for the client to send it whatever it needs from the page. To interact with the app, it has to predict an action, send it to the client, and wait for a callback to run the JavaScript. A client-side agent lives inside the application. It reads the client-side state directly and can check the value of a React state hook, inspect local storage, update the interface or prompt the user without network overhead. <blockquote class="markdown-alert-note"> CLI agent tools like <a href="https://claude.ai/code">Claude Code</a> and <a href="https://geminicli.com/">Gemini CLI</a> already use this pattern. The loop runs in a local process, tools touch the file system and shell, and the LLM is still a stateless remote endpoint. The browser is the same idea in a different runtime, with different local resources: the DOM, browser storage, and the user's active sessions. </blockquote> <h2>The browser as orchestrator</h2> The architecture is simpler than it sounds. Move the loop to the browser, and the browser becomes the orchestrator. <h3>Hybrid orchestration map</h3> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 450" width="100%" height="450"> <defs> <filter id="shadow-hybrid" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <linearGradient id="bg-hybrid" x1="0" y1="0" x2="0" y2="1"> <stop offset="0%" stop-color="#f8fafc" /> <stop offset="100%" stop-color="#f1f5f9" /> </linearGradient> <marker id="arrow-hybrid" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> </defs>  <rect x="0" y="0" width="800" height="450" rx="12" fill="url(#bg-hybrid)" stroke="#cbd5e1" stroke-width="2" />  <rect x="20" y="40" width="360" height="385" rx="8" fill="none" stroke="#cbd5e1" stroke-width="1.5" stroke-dasharray="6,4" /> <text x="200" y="65" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#94a3b8" text-anchor="middle">Browser Environment</text>  <rect x="420" y="40" width="360" height="385" rx="8" fill="none" stroke="#cbd5e1" stroke-width="1.5" stroke-dasharray="6,4" /> <text x="600" y="65" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#94a3b8" text-anchor="middle">Server Environment</text>  <rect x="100" y="130" width="200" height="80" rx="8" fill="#ffffff" stroke="#1a73e8" stroke-width="2" filter="url(#shadow-hybrid)" /> <text x="200" y="165" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Agent Loop</text> <text x="200" y="185" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">(Orchestrator & State)</text>  <rect x="100" y="330" width="200" height="60" rx="8" fill="#ffffff" stroke="#10b981" stroke-width="2" filter="url(#shadow-hybrid)" /> <text x="200" y="360" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Local Tools</text> <text x="200" y="378" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">(DOM, Local Storage)</text>  <rect x="500" y="100" width="200" height="60" rx="8" fill="#ffffff" stroke="#8b5cf6" stroke-width="2" filter="url(#shadow-hybrid)" /> <text x="600" y="130" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Remote Brain</text> <text x="600" y="148" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">(Cloud LLM + Prompt)</text>  <rect x="500" y="330" width="200" height="60" rx="8" fill="#ffffff" stroke="#ef4444" stroke-width="2" filter="url(#shadow-hybrid)" /> <text x="600" y="360" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Server Tools</text> <text x="600" y="378" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">(API, DB, Compute)</text>  <g stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrow-hybrid)">  <path d="M 300 155 L 490 125" />  <path d="M 500 145 L 310 175" />  <path d="M 200 210 L 200 320" />  <path d="M 285 210 L 500 345" /> </g>  <g font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle"> <text x="390" y="128">1. History</text> <text x="415" y="178">2. Decision</text> <text x="158" y="272">3a. Execute</text> <text x="385" y="295">3b. Delegate</text> </g> </svg> <h3>Shifting the source of truth</h3> In a traditional server-centric agent, the backend runs everything. It holds the conversation history, calls the LLM in the loop, and executes the tool calls. The frontend is just a display layer, and deferring tool calls or sub-agents to the client-side is architecturally complex. When running the loop on the client-side, web application owns the conversation state and invokes the LLM, which can live in the Cloud, in each loop. The tool calls can be handled on the client-side or, when required, can be easily deferred to the server-side via calls to REST APIs. Similarly, sub-agents can live on the client-side or on the server side. <h3>Protecting your system prompts</h3> A concern I hear oftenn from developers is how to protect their prompts on the client-side. Because a client-side agent loop can use a Cloud LLM the "secret sauce", the system prompts for the application can be stored and injected into the prompt on the server. <h2>Choosing your architecture</h2> To recap, the fundamental difference is where the agent loop—the orchestrator—lives. It doesn't mean that all agents need to run on the client-side. If an agent primary interacts with backend systems and requires no integration to the user-interface other than displaying the results, a server-side loop might be a great choice, as it also enables the same agent to run across other surfaces. But if you want your agent to have a tight intergration with the user interface, pulling client side data, showing confirmation dialogs for tool calls, reading and updating UI state, it's likely a client-side agent will give you more flexibility. <h2>Building the loop in TypeScript</h2> The architecture sounds sophisticated. The code is not. The loop has four steps: <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 700 340" width="100%" height="340"> <defs> <filter id="shadow-loop" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <linearGradient id="bg-loop" x1="0" y1="0" x2="0" y2="1"> <stop offset="0%" stop-color="#f8fafc" /> <stop offset="100%" stop-color="#f1f5f9" /> </linearGradient> <marker id="arrow-loop" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> </defs>  <rect x="0" y="0" width="700" height="340" rx="12" fill="url(#bg-loop)" stroke="#cbd5e1" stroke-width="2" />  <rect x="260" y="20" width="180" height="55" rx="8" fill="#ffffff" stroke="#94a3b8" stroke-width="2" filter="url(#shadow-loop)" /> <text x="350" y="48" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#475569" text-anchor="middle">1. History</text> <text x="350" y="66" font-family="monospace" font-size="12" fill="#64748b" text-anchor="middle">Message[]</text>  <rect x="260" y="115" width="180" height="55" rx="8" fill="#ffffff" stroke="#8b5cf6" stroke-width="2" filter="url(#shadow-loop)" /> <text x="350" y="143" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">2. Generate</text> <text x="350" y="161" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">"What's next?"</text>  <path d="M 350 200 L 400 232 L 350 264 L 300 232 Z" fill="#ffffff" stroke="#3b82f6" stroke-width="2" filter="url(#shadow-loop)" /> <text x="350" y="237" font-family="system-ui, -apple-system, sans-serif" font-size="12" font-weight="bold" fill="#1e293b" text-anchor="middle">3. Decision</text>  <rect x="80" y="205" width="140" height="55" rx="28" fill="#f1f5f9" stroke="#cbd5e1" stroke-width="2" filter="url(#shadow-loop)" /> <text x="150" y="238" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#475569" text-anchor="middle">Final Text</text>  <rect x="480" y="205" width="160" height="55" rx="8" fill="#ffffff" stroke="#10b981" stroke-width="2" filter="url(#shadow-loop)" /> <text x="560" y="233" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">4. Execute Tools</text> <text x="560" y="251" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">append results</text>  <g stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrow-loop)">  <path d="M 350 75 L 350 107" />  <path d="M 350 170 L 350 195" />  <path d="M 300 232 L 228 232" />  <path d="M 400 232 L 472 232" />  <path d="M 560 205 L 560 47 L 448 47" /> </g> </svg> Here's that loop in TypeScript, stripped to its essentials: <pre style="background-color:#282a36;"> async function runAgent(prompt: string) { // The browser owns the conversation state const history = [{ role: 'user', content: prompt }]; while (true) { // 1. Ask the model what to do next const response = await model.generate(history, tools); // 2a. The model gives a final answer if (response.text) { history.push({ role: 'assistant', content: response.text }); return response.text; } // 2b. The model wants to call tools if (response.toolCalls) { history.push({ role: 'assistant', toolCalls: response.toolCalls }); for (const call of response.toolCalls) { // 3. The browser executes the tool locally const result = await executeTool(call.name, call.args); // error handling omitted for clarity // 4. Record what happened history.push({ role: 'tool', toolCallId: call.id, content: result }); } // 5. The loop repeats, sending the updated history back to the model } } } </pre> <h3>Handling the cycle</h3> The loop handles the classic agentic cycle: <ol> <li>Send the full history to the model.</li> <li>If the model returns tool calls, execute them.</li> <li>Append the tool calls and their results to the history.</li> </ol> This is what most frameworks hide behind layers of abstraction. Once you understand this loop, you can build your own agent framework in a few hundred lines of code. If you want to see a concrete implementation of this loop, check out <a href="https://github.com/andreban/mast-ai/blob/main/packages/core/src/runner.ts"><code>AgentRunner</code></a> in the mast-ai repository. <blockquote class="markdown-alert-note"> The history array grows with every turn. For long-running agents, you will eventually hit the model's context window limit. Plan for this early: common strategies include summarising older turns into a single message, or dropping tool results once their content has been acknowledged by the model. </blockquote> <h2>Delegating to specialized agents</h2> One more thing worth knowing: agents can delegate tasks to other, specialized agents. <h3>Agent delegation tree</h3> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 400" width="100%" height="400"> <defs> <filter id="shadow-tree" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <linearGradient id="bg-tree" x1="0" y1="0" x2="0" y2="1"> <stop offset="0%" stop-color="#f8fafc" /> <stop offset="100%" stop-color="#f1f5f9" /> </linearGradient> <marker id="arrowhead-4" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> </defs>  <rect x="0" y="0" width="800" height="400" rx="12" fill="url(#bg-tree)" stroke="#cbd5e1" stroke-width="2" />  <rect x="300" y="50" width="200" height="60" fill="#e8f0fe" stroke="#1a73e8" stroke-width="3" rx="8" filter="url(#shadow-tree)" /> <text x="400" y="80" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Manager Agent (Parent)</text> <text x="400" y="98" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">Orchestrates user request</text>  <rect x="300" y="185" width="200" height="60" fill="#ffffff" stroke="#94a3b8" stroke-width="2" rx="8" filter="url(#shadow-tree)" /> <text x="400" y="215" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Researcher Agent (Child)</text> <text x="400" y="233" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">Exposed as a tool</text>  <rect x="300" y="315" width="200" height="60" fill="#ffffff" stroke="#94a3b8" stroke-width="2" rx="8" filter="url(#shadow-tree)" /> <text x="400" y="345" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">Search & Fetch Tools</text> <text x="400" y="363" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">Used by researcher</text>  <path d="M 380 110 L 380 175" stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrowhead-4)" /> <text x="325" y="148" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">1. Call Tool</text>  <path d="M 420 185 L 420 118" stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrowhead-4)" /> <text x="475" y="148" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">4. Return Result</text>  <path d="M 380 245 L 380 305" stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrowhead-4)" /> <text x="318" y="280" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">2. Use Tools</text>  <path d="M 420 315 L 420 255" stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrowhead-4)" /> <text x="482" y="280" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">3. Get Results</text> </svg> The agent loop takes conversation history and calls tools. A tool is just a function that returns a string, and that function can be another agent loop. Imagine a general "Assistant Agent" that handles user requests. If the user asks for a deep research report on a topic, the main agent doesn't need to do the research itself. It can call a specialized "Research Sub-Agent" exposed as a tool. The main agent pauses its loop, calls the research tool with a query, and the sub-agent starts its own loop to fetch URLs, summarize pages, and synthesize a report. When the sub-agent finishes, it returns the report as a string to the main agent, which resumes its loop. The sub-agent could be running on the same main thread, in a background Web Worker to keep the UI responsive, or on a remote server entirely. It might use the same LLM or a different model optimized for the task. To the parent, it's just another tool call. <blockquote class="markdown-alert-tip"> Sub-agents that do heavy work (fetching URLs, scraping pages, running long loops) are natural candidates for <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API">Web Workers</a>. Wrapping a sub-agent in a Worker keeps it off the main thread so the UI stays responsive while it runs. </blockquote> <h2>Conclusion</h2> The server doesn't go away when you move the loop to the browser. It still runs the LLM, protects your credentials, handles heavy compute. But the browser decides when to call it, what to send, and what to do with the response. If you're building an AI agent for the web, consider running the loop in the browser. Think about the <a href="https://bandarra.me/apps/agent-text-editor/">text editor from the introduction</a>: an agent that reads your selection, queries your workspace, delegates to a reviewer or writer sub-agent when needed, and renders the result as a diff for your approval. That's not a chatbot bolted onto a sidebar. That's a first-class feature, and it only works naturally when the loop lives where the UI lives. If you want to try the text editor yourself, head over to <a href="https://bandarra.me/apps/agent-text-editor/">bandarra.me/apps/agent-text-editor</a> (you'll need a Gemini API key, which you can get free at <a href="https://aistudio.google.com/">Google AI Studio</a>). The source is at <a href="https://github.com/andreban/agent-text-editor">github.com/andreban/agent-text-editor</a>. If you want to see how the agent loop is implemented, or want a foundation to build your own browser agents, check out <a href="https://github.com/andreban/mast-ai">mast-ai</a> on GitHub.

Demystify AI agents by exploring the case for moving the loop to the browser. Learn how client-side orchestration enables tight UI integration and data control.

Demystifying AI Agents: Learning the Mechanics with Rust 2026-04-22T18:09:48Z https://bandarra.me/posts/demystifying-ai-agents <img src="/images/demystifying-ai-agents-hero.jpeg" alt="" /> An AI Agent is a system that pairs a Large Language Model with a set of tools and a control loop. The model receives a prompt, decides whether to invoke a tool or provide a text response, and the loop repeats until the task is complete. When using an established framework, the mechanics of that loop are heavily abstracted. The concrete types, the ownership of the conversation history, and the data structures passed to tool functions are buried under layers of routing and configuration. You can use these frameworks for months and still treat the underlying system as a black box. We use black boxes every day. Most developers don't know how a database manages disk IO or how a compiler optimizes a branch, and usually, we don't have to. But AI frameworks represent a different kind of black box: they make the system look like magic. When the mechanics are abstracted away, the agent feels like a sentient entity rather than a piece of software. But 'magic' is just another word for 'unpredictable.' To build something reliable, you have to trade the magic for mechanics Building an agent from scratch exposes the plumbing. <a href="https://github.com/andreban/agent-rig"><code>agent-rig</code></a> is a Rust library I built for exactly that purpose: no macros, no hidden state, no framework opinions. Just the structural foundation underneath an agent system. Code runs throughout as a concrete anchor. By the end, you should be able to trace how a user prompt becomes a tool execution and then a response, which is the part most frameworks actively hide from you. <blockquote class="markdown-alert-important"> If you cannot trace how a user prompt becomes a tool execution and then a response, you aren't controlling the agent. You are treating it like a black box. </blockquote> <h2>The Engine (The Loop)</h2> If there is a secret to AI agents, it is this: AI agents aren't magical entities; they are just while loops with a better PR department. Strip away the marketing, and the heartbeat of every agent looks exactly like this: An LLM on its own is a single-shot function. You give it text, it gives you text back, and it stops. To make it "agentic," you have to wrap it in a control loop that gives it the ability to pause, ask for external data, and resume. Here is the heartbeat of every agent framework, stripped of all abstractions: <pre style="background-color:#282a36;"> // The Agentic Loop loop { // 1. Ask the model what to do next based on the current state let response = model.generate(history, tools).await; if let Some(text) = response.text { // 2a. The model provided a final answer. We are done. return text; } if let Some(tool_calls) = response.tool_calls { // 2b. The model wants to perform actions. for call in tool_calls { // 3. The system runs the requested code let result = execute_tool(call); // 4. Record what happened history.append(call); history.append(result); } // 5. Loop repeats, sending the updated history back to the model } } </pre> Visually, this process transforms the linear logic above into a recursive engine: <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 300" width="100%" height="300"> <defs> <filter id="shadow-loop" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <marker id="arrow-loop" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> </defs>  <rect x="210" y="20" width="180" height="60" rx="8" fill="#ffffff" stroke="#94a3b8" stroke-width="2" filter="url(#shadow-loop)" /> <text x="300" y="50" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#475569" text-anchor="middle">1. State (History)</text> <text x="300" y="70" font-family="monospace" font-size="12" fill="#64748b" text-anchor="middle">Vec<Message></text>  <rect x="210" y="120" width="180" height="60" rx="8" fill="#ffffff" stroke="#8b5cf6" stroke-width="2" filter="url(#shadow-loop)" /> <text x="300" y="150" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">2. Generate (Brain)</text> <text x="300" y="170" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">"What's next?"</text>  <path d="M 300 210 L 350 240 L 300 270 L 250 240 Z" fill="#ffffff" stroke="#3b82f6" stroke-width="2" filter="url(#shadow-loop)" /> <text x="300" y="245" font-family="system-ui, -apple-system, sans-serif" font-size="12" font-weight="bold" fill="#1e293b" text-anchor="middle">Decision</text>  <rect x="60" y="210" width="120" height="60" rx="30" fill="#f1f5f9" stroke="#cbd5e1" stroke-width="2" /> <text x="120" y="245" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#475569" text-anchor="middle">Final Text</text>  <rect x="420" y="210" width="150" height="60" rx="8" fill="#ffffff" stroke="#10b981" stroke-width="2" filter="url(#shadow-loop)" /> <text x="495" y="240" font-family="system-ui, -apple-system, sans-serif" font-size="14" font-weight="bold" fill="#1e293b" text-anchor="middle">3. Tool Execution</text> <text x="495" y="260" font-family="system-ui, -apple-system, sans-serif" font-size="12" fill="#64748b" text-anchor="middle">(The "Hands")</text>  <g stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrow-loop)">  <path d="M 300 80 L 300 110" />  <path d="M 300 180 L 300 205" />  <path d="M 250 240 L 190 240" />  <path d="M 350 240 L 410 240" />  <path d="M 495 210 L 495 50 L 400 50" /> </g> </svg> In <code>agent-rig</code>, this loop lives inside the <code>AgentRunner</code>. The runner orchestrates the flow of data. It takes the user's prompt, asks the model what to do, routes any requested actions to your actual code, and feeds the results back into the model until a final answer emerges. Notice what isn't in this loop: state. The engine itself doesn't remember anything between runs. It simply takes the current state (the <code>history</code>), passes it to the model, and applies the model's decisions. If the model asks to fetch three URLs, the engine fetches them concurrently, appends the HTML to the history, and loops again. <blockquote class="markdown-alert-note"> The agent isn't a persistent "being" that lives in memory. It is a series of independent stateless executions chained together by a loop. While we treat the agent as stateless for architectural purity, the application is very much stateful. The challenge of agent engineering is effectively syncing the "Long-term" state of your database with the "Short-term" context of the LLM loop without blowing your token budget. </blockquote> <h2>The Brain (The Model)</h2> Inside the loop, we have the "Brain." In a traditional software system, you would write <code>if/else</code> statements or a state machine to decide what happens next. In an agent, you delegate that logic to a Large Language Model. The Brain is stateless. It has no persistent memory of its own. Call it twice with the same input and it doesn't remember the first call. It is a mathematical function that takes a snapshot of the current situation (the history and the available tools) and predicts the single best next step. <h3>From Text to Intent</h3> We usually think of LLMs as chatbots that generate sentences. But in an agentic loop, the model’s role shifts from "talking" to "deciding." When the Brain receives a request, it has two options: <ol> <li>Provide an Answer: "The weather in London is 22°C."</li> <li>Request an Action: "I don't know the weather. Please call <code>get_weather(city: 'London')</code>."</li> </ol> In our library, this is captured by the <code>LlmModel</code> trait. Whether you are using a cloud-based model like Gemini 2.5 Pro or a small model running locally via Ollama, the interface is identical: <pre style="background-color:#282a36;"> pub struct ModelResponse { pub text: Option<String>, // Option 1: A final answer pub tool_calls: Vec<ToolCall>, // Option 2: A request for action } </pre> <h3>Provider Agnosticism</h3> Because the Brain is just a trait, the rest of your agentic system doesn't care who made the model. You can develop your logic using a cheap local model and then swap it for a powerful cloud provider with one line of code: <pre style="background-color:#282a36;"> // Switch from local Llama to cloud Gemini let model = GeminiModel::builder(api_key, "gemini-2.5-flash").build(); </pre> The Brain doesn't know it's part of a loop, and it doesn't know which company's servers it's running on. It just looks at the history you provide and predicts the next move. <blockquote class="markdown-alert-note"> The "intelligence" of an agent isn't in the loop; it’s in the model's ability to choose the correct tool call when it hits a gap in its knowledge. </blockquote> <blockquote class="markdown-alert-warning"> The Confidence Trap: This architecture relies on the model knowing when it hit a gap in its knowledge. However, models are trained to be helpful, which often makes them overconfident. If a model thinks it knows the answer, it may bypass your tool and simply hallucinate a plausible-sounding fact. This is why precise tool descriptions are mandatory—you are fighting the model's urge to just 'guess.' </blockquote> <h2>The Hands (The Tools)</h2> LLMs cannot "do" anything. They are token predictors: they take a sequence of text and predict the most likely continuation. To make an LLM interact with the real world, we give it "hands" through Tools. But how does a text-prediction engine actually "call" a function? <h3>The Mechanics: Thinking in JSON</h3> When you configure an agent with a tool, you aren't sending code to the model. You are sending a Definition, a JSON-based description that explains what the tool does and what arguments it expects. <pre style="background-color:#282a36;"> pub struct ToolDefinition { pub name: String, pub description: String, pub parameters: serde_json::Value, // JSON Schema } </pre> This definition is injected into the model's prompt. Most modern models support "Native Tool Calling," which is a fancy way of saying they have been specifically trained to recognize these definitions. When the model determines it needs to use a tool, it stops generating human-readable sentences and instead generates a specific JSON payload. Depending on the provider, this might be wrapped in special tokens (e.g., <code><tool_call> ... </tool_call></code>) or emitted through a dedicated API field. For example, to get the weather, the model doesn't just say "call the weather tool." It predicts tokens that form this: <pre style="background-color:#282a36;"> { "name": "get_weather", "args": { "city": "London" } } </pre> <h3>The Bridge: From Text to Code</h3> The execution engine (the Loop) sees this JSON, pauses the model, and looks for a registered tool matching the name <code>get_weather</code>. It then executes the actual code (the <code>call</code> function) using the provided arguments. <pre style="background-color:#282a36;"> pub trait Tool { fn definition(&self) -> ToolDefinition; async fn call(&self, args: serde_json::Value) -> Result<serde_json::Value, Error>; } </pre> The model never runs your code; your system does. The model simply predicts the arguments it thinks your code needs. Once the tool returns a result, the engine appends that result to the history and restarts the model. <h3>An API for One</h3> This architecture forces a shift in how you think about documentation. Usually, you write docs for other humans. Here, you are writing documentation for a model. The <code>name</code> needs to be distinctive. The <code>description</code> must be precise about what the tool does and when to use it. The <code>parameters</code> must use clear names and types. If your description is poor, the model will hallucinate arguments or call the tool at the wrong time. In this world, your "API documentation" is actually your code’s runtime logic. <blockquote class="markdown-alert-important"> Tool calling is just a specialized form of text completion. The model isn't "running" a function; it is predicting the JSON payload that it thinks will convince your engine to run the function for it. </blockquote> <h2>The Memory (The History)</h2> If the Engine is a stateless loop and the Brain is a stateless math function, where does the "agent" actually live? Where is its memory? The answer is simple: an agent's entire state is just an array of text messages. <h3>Short-Term Memory: The Conversation Log</h3> Setting aside potential optimizations like KV-caching or history summarization, the fundamental mechanics of agent memory are remarkably simple: the system resends the entire conversation history up to that point. When you use an LLM, it feels like it remembers what you said five minutes ago. It doesn't. Every time you send a prompt, the system resends the entire conversation history up to that point. In code, this history is just a <code>Vec<Message></code>. Every action the agent takes must be appended to this log so that on the next iteration of the loop, the Brain knows what just happened. If the model calls a tool, we append three things to the history: <ol> <li>The user's original request.</li> <li>The model's request to use a tool (e.g., <code>get_weather(London)</code>).</li> <li>The tool's result (e.g., <code>{"temp": 22}</code>).</li> </ol> When the loop runs again, the model reads the whole transcript, sees that the tool returned <code>22</code>, and finally predicts the text: "The weather in London is 22°C." Because the engine itself is stateless, the responsibility of holding this <code>Vec<Message></code> falls to the caller. This is why context windows (the maximum length of the history) are such a critical bottleneck in agent design. <h3>Long-Term Memory: Tools in Disguise</h3> What if you want the agent to remember a fact between sessions, after the context window is cleared? You might think you need a complex "Memory Manager" subsystem. You don't. You just need to give the model tools that interact with a database. <ul> <li><code>remember_fact(fact: String)</code>: A tool that takes a string and writes it to a file or database.</li> <li><code>recall_fact(query: String)</code>: A tool that searches that database and returns the result.</li> </ul> If a user says, "My dog's name is Barnaby," the model calls <code>remember_fact</code>. In a completely separate session a week later, if the user asks, "What is my dog's name?", the model calls <code>recall_fact</code>, gets the answer, and responds. <blockquote class="markdown-alert-note"> There is no magic "memory module" in an AI agent. Short-term memory is just a growing array of strings. Long-term memory is just the agent using its hands (Tools) to interact with a filing cabinet (Database). </blockquote> <h2>The Blueprint (The Configuration)</h2> We have looked at the mechanics: a loop that feeds an array of messages and tool definitions into a stateless text-predictor. But how do you tell this mechanical system to be a "Helpful Coding Assistant" rather than a "Snarky Weather Bot"? You need a configuration. In <code>agent-rig</code>, this is the <code>Agent</code> struct. The <code>Agent</code> struct is pure data. It holds no network connections and no active state. It is a static blueprint that defines the intent of the system before the loop even starts: <pre style="background-color:#282a36;"> pub struct Agent { pub name: String, pub instructions: String, pub output_schema: Option<serde_json::Value>, pub tool_names: Vec<String>, } </pre> <h3>The Specification</h3> When you initialize the Engine, you hand it this blueprint. The engine uses it to set up the starting conditions for the Brain. <code>instructions</code> is the system prompt: the persona, constraints, and operational boundaries. These get prepended to the message history on every single iteration, keeping the model aligned with its task. <code>tool_names</code> is a permission whitelist. A system might have dozens of tools registered (e.g., <code>read_file</code>, <code>query_database</code>, <code>delete_record</code>), but this list restricts the model to only the tools it needs for its specific role. A "research assistant" cannot access <code>delete_record</code> even if it predicts the correct JSON payload to call it. <code>output_schema</code> is a structural constraint. If the agent returns data for another machine to consume, this JSON schema forces the model's final text output to match a specific structure. <pre style="background-color:#282a36;"> let agent = Agent::builder() .name("technical-editor") .instructions("You review documentation for clarity and technical accuracy.") .tool("check_links") .tool("validate_code_snippets") .build(); </pre> Because the Blueprint is just a plain data structure, it is portable. Store your agent definitions in a database or configuration file. Update an agent's persona or permissions without touching a single line of execution logic in the Engine. <blockquote class="markdown-alert-tip"> If the configuration is the job description, the Engine is the employee who reads the description and starts working. The more precise the job description, the more predictable the employee's performance. </blockquote> <h2>Conclusion: Putting It Together</h2> An AI agent is a specific arrangement of standard software patterns combined with an LLM. Nothing more. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 300" width="100%" height="100%"> <defs> <filter id="shadow" x="-5%" y="-5%" width="110%" height="110%"> <feDropShadow dx="0" dy="2" stdDeviation="3" flood-opacity="0.1" /> </filter> <linearGradient id="bg" x1="0" y1="0" x2="0" y2="1"> <stop offset="0%" stop-color="#f8fafc" /> <stop offset="100%" stop-color="#f1f5f9" /> </linearGradient> <marker id="arrow" viewBox="0 0 10 10" refX="8" refY="5" markerWidth="6" markerHeight="6" orient="auto"> <path d="M 0 0 L 10 5 L 0 10 z" fill="#94a3b8" /> </marker> </defs>  <rect x="10" y="10" width="580" height="280" rx="12" fill="url(#bg)" stroke="#cbd5e1" stroke-width="2" /> <text x="300" y="40" font-family="system-ui, -apple-system, sans-serif" font-size="18" font-weight="bold" fill="#334155" text-anchor="middle">The Engine (Loop)</text>  <rect x="50" y="210" width="500" height="60" rx="8" fill="#ffffff" stroke="#94a3b8" stroke-width="2" filter="url(#shadow)" /> <text x="300" y="246" font-family="system-ui, -apple-system, sans-serif" font-size="16" font-weight="bold" fill="#475569" text-anchor="middle">Memory (Message Array)</text>  <rect x="50" y="80" width="140" height="80" rx="8" fill="#ffffff" stroke="#3b82f6" stroke-width="2" filter="url(#shadow)" /> <text x="120" y="115" font-family="system-ui, -apple-system, sans-serif" font-size="16" font-weight="bold" fill="#1e293b" text-anchor="middle">Blueprint</text> <text x="120" y="135" font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#64748b" text-anchor="middle">(Config)</text>  <rect x="230" y="80" width="140" height="80" rx="8" fill="#ffffff" stroke="#8b5cf6" stroke-width="2" filter="url(#shadow)" /> <text x="300" y="115" font-family="system-ui, -apple-system, sans-serif" font-size="16" font-weight="bold" fill="#1e293b" text-anchor="middle">Brain</text> <text x="300" y="135" font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#64748b" text-anchor="middle">(Model)</text>  <rect x="410" y="80" width="140" height="80" rx="8" fill="#ffffff" stroke="#10b981" stroke-width="2" filter="url(#shadow)" /> <text x="480" y="115" font-family="system-ui, -apple-system, sans-serif" font-size="16" font-weight="bold" fill="#1e293b" text-anchor="middle">Hands</text> <text x="480" y="135" font-family="system-ui, -apple-system, sans-serif" font-size="14" fill="#64748b" text-anchor="middle">(Tools)</text>  <g stroke="#94a3b8" stroke-width="2" fill="none" marker-end="url(#arrow)">  <path d="M 190 120 L 220 120" />  <path d="M 370 120 L 400 120" />  <path d="M 480 160 L 480 200" />  <path d="M 300 210 L 300 170" /> </g> </svg> Once you can see that, 'autonomy' stops feeling like magic and debugging becomes concrete: inspect the JSON schemas, watch what's going into the message array, and tighten the system prompt. The frameworks that hide this plumbing aren't saving you from complexity. They’re just deferring it until your ‘happy path’ meets the unfiltered chaos of a live user. If you want to see exactly how these pieces are implemented, or want a macro-free foundation to build your own agents, check out <a href="https://github.com/andreban/agent-rig"><code>agent-rig</code></a> on GitHub.

Demystify AI agents by exploring their five core mechanics: the loop, brain, hands, memory, and blueprint. Learn the plumbing that sits beneath the abstractions.

Exploring Client-Side Code Execution with WebMCP 2026-04-28T12:28:58Z https://bandarra.me/posts/webmcp-code-execution <img src="/images/webmcp-code-execution-hero.jpeg" alt="" /> When an AI agent needs to solve a complex problem on a web page, it usually involves a lot of back-and-forth. The agent calls a tool, processes the response, decides on the next step, and calls another tool. <a href="https://github.com/GoogleChromeLabs/webmcp-tools/">WebMCP</a> is a protocol that lets AI agents interact with web pages through structured tools, but even with well-defined tools, this continuous back-and-forth creates significant overhead. Lately, I've been focused on making these multi-step interactions faster and more token-efficient. In this post, I want to share my exploration into bringing code execution directly to the client side to bypass this latency entirely. <blockquote class="markdown-alert-caution"> This article discusses client-side code execution, which involves significant security risks. The techniques described are strictly for exploration and are not recommended for production environments. </blockquote> <h2>Latency and Context Bloat at Runtime</h2> You can see this in action with the <a href="http://webmcp-maze.bandarra.me/">WebMCP Maze demo</a>. Open it, connect your AI agent, and ask it to solve the maze. Without code execution enabled, the agent navigates using five atomic tools: <code>look</code> to inspect its surroundings, <code>move</code> to step in a direction, <code>pickup</code> and <code>drop</code> to manage items, and <code>use</code> to clear locked doors or rocks blocking the path. Each action is a separate tool call, and with a fog-of-war mechanic limiting visibility, the agent may need dozens of round-trips just to find the exit. As Anthropic recently highlighted in their article on <a href="https://www.anthropic.com/engineering/code-execution-with-mcp">code execution with MCP</a>, this sequential tool calling leads to high usage of the model's context window. Each turn adds tool definitions and intermediary results to the conversation history. The standard solution is to let agents run code on the server side to handle complex logic, but that introduces a new problem in a browser environment: if the server-side script needs to call WebMCP tools running on the user's page, every tool call becomes a network roundtrip. I wanted to see if we could avoid those roundtrips entirely by moving the execution to where the tools live: the client side. <h2>The Maze Game Experiment</h2> To test the idea, I added an <a href="https://github.com/GoogleChromeLabs/webmcp-tools/blob/main/demos/webmcp-maze/src/webmcp/tools/EvalTool.ts"><code>eval_code</code> tool</a> to the same demo. You can try it by appending <code>?eval_tool=true</code> to the URL. Instead of navigating step by step, the agent now writes a complete JavaScript algorithm and submits it for execution in one shot. <iframe width="800" height="450" style="width:100%;" src="https://www.youtube.com/embed/etPoy2Bx9mg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> The tool accepts a JavaScript string from the agent and runs it as an async function body inside a sandboxed Web Worker. From within that code, the agent can call any of the game tools via <code>await window.gameTools.executeTool(name, args)</code>, and the result is returned once the algorithm completes. The prompt guides the agent to write a loop that keeps moving and using items until <code>atExit</code> is true, rather than asking for individual moves. <h2>How I Built It</h2> Allowing an AI agent to run arbitrary code on a user's page is, obviously, a massive security risk. I needed a way to run untrusted code safely. <h3>1. Isolation via Web Workers</h3> I decided to run the agent's code inside a sandboxed Web Worker created from a blob URL. This gives us several immediate security benefits: <ul> <li>The code cannot access the DOM or the main thread's global scope.</li> <li>It cannot touch cookies, <code>localStorage</code>, or <code>IndexedDB</code>.</li> </ul> <h3>2. Guardrails with CSP</h3> To further constrain the worker, the demo applies strict Content Security Policy headers. <code>worker-src blob:</code> limits worker creation to blob URLs, blocking any attempt to load worker scripts from external origins, while <code>object-src 'none'</code> and <code>default-src 'self'</code> close off remaining resource-loading vectors. Crucially, the policy also includes <code>connect-src 'self'</code>, which prevents the worker from exfiltrating data to arbitrary hosts. (Note: As discussed below, the demo also requires <code>'unsafe-eval'</code> for code execution, which is a significant trade-off). <h3>3. The Bridge</h3> The most interesting challenge was connecting the worker back to the game. I exposed the game's functions via a custom bridge on <code>window.gameTools</code>. The Worker communicates with the main thread via message passing, which then invokes the currently registered WebMCP tools (like <code>move</code> or <code>look</code>) and returns the results to the worker. This ensures the worker can only access the capabilities the agent has already been granted. You can see the full implementation, including the message protocol and timeout logic, in <a href="https://github.com/GoogleChromeLabs/webmcp-tools/blob/main/demos/webmcp-maze/src/webmcp/tools/EvalTool.ts"><code>EvalTool.ts</code></a>. <h2>What I Found</h2> Client-side code execution turned out to be incredibly effective at reducing latency. In a maze requiring ~40 moves to solve, the step-by-step approach added 1–2 seconds of model latency per move, on top of the character animation, for a total wait of 40–80 seconds just in model round-trips. With client-side execution, the model spends a few seconds generating the JavaScript algorithm once, and after that the solution depends only on the animation speed. The bottleneck shifted entirely: instead of the model deciding each move one at a time, its entire job was reduced to writing an algorithm once. What remained was deliberate UX, not overhead. <h2>The Risks of Client-Side Execution</h2> While Web Workers and CSP provide good guardrails, running untrusted code on the client is still a high-stakes game. This implementation is strictly an exploration and not safe for production use. Here are the specific risks I've identified: <ol> <li>Resource Exhaustion: Malicious or buggy code generated by the agent could run infinite loops or consume excessive memory. While it won't freeze the main UI thread, it can easily drain the user's battery and CPU resources. (To mitigate this, the demo enforces a strict 5-minute timeout on the worker's execution).</li> <li>Sandbox Escapes: While Web Workers provide process-like isolation, they are not bulletproof. Browser vulnerabilities could theoretically allow code running in a worker to escape the sandbox and access the main thread.</li> <li>The Origin & Cookie Problem: This is perhaps the most subtle risk. Web Workers cannot read cookies directly because they have no DOM access. However, because the worker is created from a blob URL, it inherits the parent page's origin. That means any <code>fetch</code> requests it makes to the same origin are treated as same-origin requests and include the page's session cookies. While <code>connect-src 'self'</code> prevents the worker from exfiltrating data to external servers, it still allows the worker to perform authenticated actions against your own API on behalf of the user.</li> <li>The <code>unsafe-eval</code> Compromise: To allow the agent to self-correct code, the worker uses <code>new Function()</code> for execution. Because Blob workers inherit the parent's CSP, the main application must allow <code>script-src 'unsafe-eval'</code>. This is a significant security trade-off made for the sake of the demo's developer experience (DX), but it's not a trade-off I would recommend for a production application.</li> </ol> To be production-ready, this pattern would likely require running the execution worker from a completely separate, sandboxed domain (cross-origin). This would prevent it from accessing the main site's origin storage and cookies entirely. <h2>What Could Come Next</h2> This exploration highlights a clear direction for the future of WebMCP. I believe the responsibility for providing a secure, isolated execution environment lies with the agent platform, not the individual web developer. Asking every site to implement its own complex sandboxing logic is a recipe for security fragmentation. Centralizing this capability in the agent platform makes more sense for security and scalability. Ultimately, the performance gains are too significant to ignore. What started as a way to reduce the back-and-forth overhead of step-by-step tool calls turned into a fundamentally different model for how agents interact with web pages. However, this demo is strictly an exploration of what's possible, not a recommendation for implementation. The path forward requires expert-led, platform-level infrastructure that makes client-side code execution a safe, first-class citizen in the agentic web.

Discover how client-side code execution dramatically boosts AI agent performance when interacting with web pages, bypassing latency and context bloat inherent in traditional back-and-forth tool calls. This innovative approach, demonstrated with WebMCP, allows agents to execute complex JavaScript algorithms directly in a sandboxed browser environment, significantly reducing interaction time. While showcasing immense potential, this exploration highlights critical security challenges that require robust, platform-level solutions before client-side execution can become a safe, production-ready standard for the agentic web.

AI Agents and WebMCP: Tools as Self-Loading Skills 2026-04-01T16:00:00Z https://bandarra.me/posts/webmcp-tools-as-skills <img src="/images/WebMCP-Factory-Hero.png" alt="" /> I've been exploring <a href="https://developer.chrome.com/blog/webmcp-epp">WebMCP</a>, the browser's native tool calling API that lets any web page register tools for AI agents. Instead of an agent scraping or guessing at page structure, the site declares exactly what it can do: structured, typed tool calls the agent can discover and invoke directly from the browser. It's a clean interface, but it comes with a notable limitation: there's no facility for injecting system prompts or appending to the agent's context. Skills, as a concept, don't exist in the spec. Everything the agent knows has to arrive through tool calls. That limitation got me thinking about multi-step, state-dependent workflows. This is what I found. <h2>The Problem: State-Dependent Workflows at Runtime</h2> Picture a browser agent helping a seller fulfill a custom gift bundle order. The store's dashboard exposes tools like <code>check_stock</code>, <code>reserve_item</code>, <code>add_gift_wrap</code>, <code>generate_packing_slip</code>, and <code>schedule_pickup</code>. The agent's job is to work through the order: verify each item is in stock, reserve them, attach the requested extras, generate the slip, and hand off to shipping. The happy path is straightforward. But orders aren't always clean. Gift wrap can't be added until items are reserved. The packing slip needs to reflect the final contents, not the original request. And if one item is out of stock, the agent needs to find a substitute, then pick up where it left off, not restart from scratch. The tools are atomic. The sequencing and recovery logic are not. The naive fix is a single <code>fulfill_bundle</code> tool that encodes the full workflow in code. It works until it doesn't: every edge case needs to be anticipated upfront, recovery logic is hardcoded, and when something unexpected happens the behavior is opaque. You've traded an adaptive agent for a brittle script. Skills seemed like a more promising direction: text-based protocols that the agent reads and interprets against the current state. Instead of encoding what to do, you encode how to think about what to do. The agent checks state, follows the protocol, and adapts. <h2>What I Tried: Skills as Self-Loading Tools</h2> WebMCP gives you no mechanism to inject protocols into the agent's context. You could pack protocol knowledge into tool descriptions, but that muddies the tool's own purpose and breaks down quickly: the same tool can be used by multiple skills, so whose protocol goes in the description? We needed a cleaner way to deliver multi-step instructions to the agent on demand. It turned out the answer was already in the tool interface itself. A tool has two surfaces: its description (what the agent sees when scanning available tools) and its return value (what the agent receives after calling it). What if you used one for discovery and the other for delivery? I registered each skill as a zero-argument tool. The description is a one-sentence summary, just enough for the agent to recognize the skill as relevant. Calling the tool returns the full step-by-step protocol. The agent loads knowledge exactly when it needs it, and not before. <pre style="background-color:#282a36;"> skill_fulfill_bundle description: "Protocol for fulfilling a custom gift bundle order." returns: "Gift Bundle Fulfillment Protocol: 1. Call get_order to read the requested items and any special instructions. 2. For each item, call check_stock. If unavailable, call find_substitute. 3. Call reserve_item for each confirmed item. 4. If gift wrap was requested, call add_gift_wrap. 5. Call generate_packing_slip with the final item list. 6. Call schedule_pickup." </pre> The agent sees a short hint. It decides the skill is relevant. It calls the skill. Now it has instructions. <h2>The Demo: The Same Problem in a Factory</h2> Reproducing this in a live demo requires a state machine with enough moving parts to make the problem visible, but simple enough to follow in real time. I took inspiration from <a href="https://www.factorio.com/">Factorio</a>, a game built around chained production lines and resource dependencies, and built a small factory in the browser. It has a multi-step production chain, intermediate state that changes with each tool call, and a recovery path when a resource runs out mid-sequence. The challenge is the same as the seller dashboard; the domain is just more legible. <a href="https://bandarra.me/apps/webmcp-factory/">Try the live demo.</a> The agent's goal is to manufacture an Electric Motor from raw materials (iron ore, copper ore) using three devices. The devices: <table><thead><tr><th>Device</th><th>What it does</th></tr></thead><tbody> <tr><td>Smelter</td><td>Converts iron ore → iron plate, or copper ore → copper plate</td></tr> <tr><td>Forge</td><td>Converts iron plates → iron gear (or salvages gear → plate)</td></tr> <tr><td>Assembler</td><td>Winds copper plate → copper coils, or combines gear + coils → motor</td></tr> </tbody></table> The atomic tools the agent has: <ul> <li><code>get_state</code>: reads the full inventory and all device tray contents</li> <li><code>mine_iron_ore</code> / <code>mine_copper_ore</code>: adds 1 unit of ore to inventory</li> <li><code>load(device, item, qty)</code>: moves items from inventory into a device's input tray</li> <li><code>unload(device, item, qty)</code>: returns items from a tray back to inventory</li> <li><code>smelt</code> / <code>forge</code> / <code>assemble</code>: runs a device; matches the tray against known recipes</li> </ul> Recipes are matched exactly: the tray must contain precisely the right items. If it doesn't, the device returns an error and leaves the tray unchanged so the agent can correct itself. The recipe chain to produce one Electric Motor: <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 500 360" width="500" height="360" style="font-family: Roboto, sans-serif; max-width: 100%; display: block;"> <defs> <marker id="arr" markerWidth="8" markerHeight="6" refX="7" refY="3" orient="auto" markerUnits="userSpaceOnUse"> <polygon points="0 0, 8 3, 0 6" fill="#2bbc8a"/> </marker> </defs>  <rect width="500" height="360" rx="12" fill="#263238"/>  <line x1="150" y1="58" x2="150" y2="110" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/> <line x1="150" y1="148" x2="150" y2="200" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/>  <line x1="350" y1="58" x2="350" y2="110" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/> <line x1="350" y1="148" x2="350" y2="200" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/>  <line x1="155" y1="238" x2="214" y2="300" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/> <line x1="345" y1="238" x2="286" y2="300" stroke="#2bbc8a" stroke-width="1.5" marker-end="url(#arr)"/>  <rect x="107" y="72" width="56" height="16" rx="3" fill="#263238"/> <text x="135" y="84" text-anchor="middle" fill="#c9cacc" font-size="11">smelt ×2</text> <rect x="118" y="162" width="36" height="16" rx="3" fill="#263238"/> <text x="136" y="174" text-anchor="middle" fill="#c9cacc" font-size="11">forge</text>  <rect x="322" y="72" width="36" height="16" rx="3" fill="#263238"/> <text x="340" y="84" text-anchor="middle" fill="#c9cacc" font-size="11">smelt</text> <rect x="306" y="162" width="64" height="16" rx="3" fill="#263238"/> <text x="338" y="174" text-anchor="middle" fill="#c9cacc" font-size="11">assemble</text>  <rect x="152" y="261" width="64" height="16" rx="3" fill="#263238"/> <text x="184" y="273" text-anchor="middle" fill="#c9cacc" font-size="11">assemble</text> <rect x="284" y="261" width="64" height="16" rx="3" fill="#263238"/> <text x="316" y="273" text-anchor="middle" fill="#c9cacc" font-size="11">assemble</text>  <rect x="80" y="22" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="150" y="45" text-anchor="middle" fill="#c9cacc" font-size="13">2× Iron Ore</text> <rect x="80" y="112" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="150" y="135" text-anchor="middle" fill="#c9cacc" font-size="13">2× Iron Plate</text> <rect x="80" y="202" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="150" y="225" text-anchor="middle" fill="#c9cacc" font-size="13">1× Iron Gear</text>  <rect x="280" y="22" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="350" y="45" text-anchor="middle" fill="#c9cacc" font-size="13">1× Copper Ore</text> <rect x="280" y="112" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="350" y="135" text-anchor="middle" fill="#c9cacc" font-size="13">1× Copper Plate</text> <rect x="280" y="202" width="140" height="36" rx="6" fill="#303f46" stroke="#2bbc8a" stroke-width="1.5"/> <text x="350" y="225" text-anchor="middle" fill="#c9cacc" font-size="13">2× Copper Coil</text>  <rect x="180" y="302" width="140" height="36" rx="6" fill="#2bbc8a" stroke="#2bbc8a" stroke-width="1.5"/> <text x="250" y="325" text-anchor="middle" fill="#263238" font-size="13" font-weight="bold">1× Electric Motor</text> </svg> The skill layer on top: Seven skill tools sit alongside the factory tools in the agent's tool list: <ul> <li><code>skill_recipe_iron_plate</code>: how to produce iron plate</li> <li><code>skill_recipe_iron_gear</code>: how to produce iron gear</li> <li><code>skill_recipe_copper_plate</code>: how to smelt copper ore into copper plate</li> <li><code>skill_recipe_copper_coil</code>: how to produce copper coils</li> <li><code>skill_recipe_electric_motor</code>: how to assemble the final motor</li> <li><code>skill_assemble_electric_motor</code>: the full top-level protocol (calls the recipe skills in order)</li> <li><code>skill_salvage_iron_plate</code>: recovery protocol, dismantle a gear to recover a plate when ore runs out</li> </ul> When the agent invokes <code>skill_assemble_electric_motor</code>, it receives a protocol that tells it to check state, invoke the recipe skills as needed, and proceed in order. Each recipe skill gives it the exact load/run sequence for that item. The agent never needs to reason about the factory's internals; it just follows the protocol it just loaded. <h2>The "Aha" Moments</h2> <iframe width="800" height="450" style="width:100%;" src="https://www.youtube.com/embed/r1HRcERdvw0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> The demo randomizes the starting inventory. This is where the approach pays off: the same orchestration skill, given to the same agent, produces different tool call sequences depending on what's already in inventory. No code branches. The agent reads state, reads protocols, and adapts. The recovery case is more striking: if iron ore runs out mid-assembly, the agent invokes <code>skill_salvage_iron_plate</code>, loads a gear into the forge, and recovers a plate, with no hardcoded fallback logic. The skill existed in the tool list all along. The agent just hadn't needed it yet. The UI shows the chain of thought live: each skill invocation appears as a header in the log, with the agent's actual tool calls beneath it. You can watch the protocol-to-action mapping in real time. <h2>What I Found</h2> Skills-as-tools turned out to be a lightweight pattern with no special infrastructure requirements. If your agent platform supports tools, you already have everything you need. The tool description handles discovery; the return value handles delivery. Knowledge loads on demand, context stays clean, and the agent can adapt to state it hasn't seen before, because it's reading instructions, not executing a script. <a href="https://bandarra.me/apps/webmcp-factory/">See it in action in the live demo.</a> <h2>What Could Come Next</h2> One thing I kept thinking about while building this: skills-as-tools works, but it's a convention layered on top of the existing tool interface, not something the spec knows about. A site can register a skill tool today and an agent can use it, but there's no shared signal that distinguishes a skill from a regular tool. The agent has to infer it from naming or description. It would be interesting to see skills treated as a first-class concept in the WebMCP specification, as a dedicated registration type that agents could recognize and handle differently from atomic tools. A skill could carry metadata like a short summary for the tool list, a longer protocol payload returned on invocation, and maybe even a list of the atomic tools it expects to be available. That would make the pattern more discoverable, more composable, and easier to reason about on both sides of the interface. Whether that's the right direction for WebMCP to go is an open question. But the fact that it's expressible today as a pure convention suggests the underlying model is flexible enough to support it. If you want to make your own site agent-ready, WebMCP is currently in Early Preview. Sign up to get access to documentation, demos, and new APIs as they land: <a href="https://developer.chrome.com/blog/webmcp-epp">developer.chrome.com/blog/webmcp-epp</a>.

Empower your AI agents to intelligently navigate complex, state-dependent workflows with "skills as lazy-loaded protocols." This innovative method enables agents to "read" dynamic, text-based instructions on demand, leveraging tool descriptions for discovery and return values for delivering full protocols. Discover how agents can adapt to unpredictable scenarios, orchestrate intricate tool sequences, and gracefully recover from issues, all while maintaining a clean context by loading knowledge precisely when needed.

The Vibe to Production Pipeline 2026-03-10T12:59:00Z https://bandarra.me/posts/vibe-to-production <img src="/images/vibe-to-production-hero.jpeg" alt="" /> In my early days as a developer, I remember working on a prototype to show our stakeholders to demonstrate the solution we were planning to build. I was polishing the user interface, tweaking fonts, margins, alignments of items on the page, ensuring the transitions were smooth, and making the filters look "pixel-perfect." A more senior engineer noticed what I was doing and asked why I was doing it. I then explained I wanted to make the prototype look polished. They told me not to do it. At first, I was confused, but the engineer clarified: if stakeholders see a polished prototype, they will think the project almost finished. They won't see the duct tape and the hardcoded strings holding the backend together. They’ll think we’re a week away from shipping when we haven’t even touched the database indexing yet. He wanted the prototype to look "scrappy" because scrappiness is a vital communication signal. It tells the world that this is an idea, not an infrastructure. Today, we’ve hit a version of this problem that is much more subtle. We’ve entered the era of Vibe Coding. <h3>The Paradox of the Vibe</h3> In my workflow, Vibe coding is the spiritual successor to those scrappy prototypes, as I treat vibe-coded code as a disposable medium for thought. When I’m in "vibe mode," I’m using AI as a fast prototyping tool that allows me to quickly experiment and iterate on concepts, significantly accelerating that process. When doing this, I don't care about the folder structure, the variable naming, or whether the state management follows a strict pattern. I also don't care about performance, reliability or security. I just want to see if the concept has legs. In this phase, the AI is the lead, and I’m just the "vibe checker." If the result looks like it works, it’s a win. But here is the catch: modern LLMs are too good at the polish. Unlike my old prototypes where I had to manually add the "shine," AI agents produce shippable-looking UI/UX by default. This creates a false sense of "readiness" that ignores the invisible 90% of software engineering. <h3>Switching to AI Assisted Engineering</h3> When it’s time to move from a concept to "serious" code that I actually want roll out to production, my relationship with the AI flips. This is where I move into AI Assisted Engineering. In this mode, I’m the Lead Architect, and the AI is a high-output Senior Engineer who still needs strict guardrails. To manage this, I follow a three-component process: <ol> <li>The "Contract of Truth" (Design Doc): Before touching the implementation, I work with the AI to draft a design doc. This isn't just about features; it’s about non-functional requirements. We define the data schema, error handling strategies, and security constraints.</li> <li>Chunked Implementation: I never ask an agent to "build the feature." Instead, I work with the agent to break the implementation into manageable chunks. This makes the code review process actually possible. I’ll read the code as it’s generated, stopping to provide feedback the moment I see a potential issue, then review the change as a whole before accepting.</li> <li>The Feedback Loop and Memory: When a decision is changed during the implementation, I ask the agent to update the design doc. For certain issues, like a specific way we handle API auth, I also update the agent's memory file. This ensures that the agent actually "learns" the project's constraints over time.</li> </ol> <h3>A Hybrid Future</h3> Does this mean we should force AI agents to produce "scrappy" prototypes? Not necessarily. We should take advantage of the AI’s ability to create high-fidelity prototypes, but we must be the ones providing the reality check. As stakeholders' expectations for polish increase, our role shifts from "building the UI" to "managing the truth." <table><thead><tr><th style="text-align: left">Aspect</th><th style="text-align: left">Vibe Coding</th><th style="text-align: left">AI Assisted Engineering</th></tr></thead><tbody> <tr><td style="text-align: left">Goal</td><td style="text-align: left">Exploration & Proof of Concept</td><td style="text-align: left">Production Reliability</td></tr> <tr><td style="text-align: left">Priority</td><td style="text-align: left">Speed & "Look and Feel"</td><td style="text-align: left">Architecture & Non-functional Req.</td></tr> <tr><td style="text-align: left">Human Role</td><td style="text-align: left">Vibe Checker</td><td style="text-align: left">System Architect / Reviewer</td></tr> <tr><td style="text-align: left">Output</td><td style="text-align: left">Disposable Prototype</td><td style="text-align: left">Maintainable System</td></tr> </tbody></table> The most effective engineers in 2026 won't be the ones who just "vibe" their way through a project, nor the ones who refuse to use AI. They will be the ones who know exactly when to let the vibe rip and exactly when to pull out the design doc and start engineering.

Unlock the future of software development by mastering the balance between rapid "Vibe Coding" and structured "AI Assisted Engineering." This insightful guide reveals how AI's impressive polish can create a false sense of readiness and why transitioning from a "vibe checker" to a "lead architect" is essential for production-grade code. Learn practical strategies—like design docs and chunked implementation—to effectively manage AI as a high-output assistant, ensuring seamless transitions from exploratory prototypes to reliable, scalable solutions.

The point-and-click UI paradox 2026-02-10T19:57:00Z https://bandarra.me/posts/point-and-click-paradox <img src="/images/point-and-click-paradox.png" alt="" /> When I was a kid, playing adventure games meant staring at a blinking cursor. If you wanted to interact with the digital world, you had to type exactly what you wanted your character to do. “Open door.” “Pick up key.” “Talk to wizard.” It felt like magic, but it was also incredibly rigid. Those early text parsers were notoriously picky. If you typed “Grab the key” instead of “Pick up key,” the game would stubbornly refuse to understand you. So when point-and-click adventure games arrived, they were an absolute revolution. Suddenly, you didn't have to guess the right verb. You could just look at the screen, see your options, and click. It was faster, more intuitive, and infinitely easier. Software design followed suit, and for the last few decades, the graphical user interface has ruled the world. We traded typing for clicking, and we never looked back. But today, we are facing a strange paradox. The very interface that was supposed to make software easier to use often becomes the thing making it harder. It all comes down to the complexity threshold. <h2>The 747 Dashboard Problem</h2> Point-and-click is flawless for simple actions. Hitting "Play" on a video, toggling your Wi-Fi on, or "liking" a post will always be best served by a simple button. But as software became more powerful, user intent became more complex. Every new feature required a new button, a new slider, or a new dropdown menu. Eventually, we reached a point where many modern user interfaces crossed the complexity threshold. They stopped looking like intuitive tools and started looking like the dashboard of a Boeing 747. To combat these dense walls of icons and nested menus, a massive portion of modern design work is actually just triage. Designers ruthlessly filter out the features that matter to most users and hide or drop the ones that don't, all just to keep the interface approachable. Finding the right button in a cluttered interface has become just as frustrating as guessing the right verb in a 1980s text adventure. <h2>The Left-Hand Rail of Doom</h2> A perfect real-world example of this is the travel industry. I used to work on a travel booking site, and we knew firsthand that flight search filters were a major pain point. When you search for a flight, you are usually confronted with a daunting left-hand rail of checkboxes covering layover durations, specific airline alliances, baggage inclusions, and exact departure windows. It is overwhelming. We spent an incredible amount of time optimizing that space, trying to serve the advanced power users without completely alienating the casual vacationers. But no matter how elegantly you design a 747 dashboard, it is still a 747 dashboard. The complexity threshold has been crossed. <h2>The Hybrid Solution and Two Paths Forward</h2> This is where the paradigm is shifting. Today, forward-thinking platforms are realizing that while clicking "Search" is easy, clicking twenty different filter parameters is not. To solve this, they are bringing back the text box, powered by Natural Language Interfaces based on modern LLMs. Instead of hunting for the "Non-stop" checkbox, adjusting a slider to "Departure after 5 PM," and checking a box to exclude budget airlines, you can simply type: <blockquote> "Find me a nonstop flight to London leaving Friday evening, returning Sunday, under $500, and not on a budget airline." </blockquote> This shift is happening in two distinct ways. First, sites are building their own internal LLM agents. By leveraging modern LLM capabilities like structured output and function calling, a travel site can take your natural language prompt and instantly translate it into the exact JSON payload their backend needs to filter the flights. You bypass the cluttered UI entirely. But there is a catch. Users do not want to learn how to talk to fifty different site-specific AI chatbots. They increasingly want to use their own preferred agents, like a browser-level assistant, to navigate the web. This is where standardized infrastructure like <a href="https://developer.chrome.com/blog/webmcp-epp">WebMCP (Model Context Protocol for the Web)</a> comes into play. WebMCP is designed specifically for the scenario where a user brings their own agent to a site. Instead of forcing that browser agent to blindly scrape a webpage and guess where the "Submit" button is, WebMCP allows developers to expose structured actions and tools directly via client-side JavaScript. Your ubiquitous browser agent can securely call the exact function needed to execute your complex request, completely bypassing the site's visual UI. <h2>The Dynamic Threshold</h2> But what if we take this a step further? The complexity threshold does not have to be a fixed line drawn by a UX designer. It could be dynamic. Imagine a future where your browser agent learns your habits, preferences, and intent over time. Because it understands the site's capabilities through WebMCP, it can act as a real-time UX designer. Instead of showing you the standard 747 dashboard, the browser dynamically picks and renders only the UI elements that matter to your specific journey. If the agent knows you strictly fly Star Alliance and always check a bag, it might completely hide those filters and instead just present you with three highly relevant sliders for departure times. There is still a lot of experimentation to happen in this space. But the idea that the interface itself could mold to your specific needs, moment by moment, completely redefines how we think about design. <h2>The Future is Symbiotic</h2> We abandoned text inputs decades ago because they created too much friction. But as our graphical interfaces grew bloated and overwhelming, clicking became the friction. Does this mean the point-and-click UI is dead? Absolutely not. The future of UI isn't a total return to text. It is strictly hybrid. We can see this hybrid future taking shape with innovations on the other side of the equation, like <a href="https://modelcontextprotocol.io/docs/extensions/apps">MCP Apps</a>. While WebMCP helps agents talk to websites, MCP Apps allow the chat interface to spin up miniature, interactive visual components right inside the conversation. If you ask your browser agent to analyze flight pricing trends, it does not just spit back a text summary. It renders an interactive, point-and-click chart right in the chat so you can hover, zoom, and explore. For actions below your personal complexity threshold, buttons will reign supreme. But for complex, multi-layered tasks, the era of hunting through endless menus is ending. Tomorrow's best software won't force you to choose between clicking and typing. Instead, it will seamlessly learn your habits and offer you the exact right tool for the job.

Witness the ongoing revolution in user interfaces, as the once-dominant point-and-click model struggles with overwhelming complexity. This deep dive reveals how AI-powered natural language interfaces are bringing back intelligent text input, creating a powerful hybrid UI that seamlessly blends clicking for simple tasks with conversational commands for intricate operations. Explore how innovative protocols like WebMCP are enabling software to dynamically adapt to your intent, promising a future where interacting with technology is more intuitive, efficient, and personalized than ever before.

Beyond the Viewport: Capturing Full-Size Screenshots with Rust and Chrome 2026-02-01T19:36:00Z https://bandarra.me/posts/cdp-full-page-screenshots <img src="/images/cdp-full-page-screenshots-hero.jpg" alt="" /> If you’ve ever tried to automate website screenshots using Selenium or WebDriver, you’ve likely hit the "cutoff" wall. By default, most drivers only capture what’s currently visible in the browser window. If your page is 5,000 pixels long, but your window is only 1,080, you’re missing the best part of the story. In this post, we’re going to look at how to use the Chrome DevTools Protocol (CDP) via the <a href="https://crates.io/crates/thirtyfour">thirtyfour</a> crate to capture every single pixel of a webpage, from header to footer. <h2>Why standard screenshots fail</h2> Standard WebDriver commands are designed for cross-browser compatibility. Because not every browser handles "full-page" rendering the same way, the lowest common denominator is the Viewport. To get the full page in Chrome, we need to go "under the hood" and talk to Chrome directly using CDP. <h2>The Secret Sauce: <code>Page.captureScreenshot</code></h2> Chrome provides a specific command called <code>Page.captureScreenshot</code>. The real hero here is a parameter called <code>captureBeyondViewport</code>. When set to <code>true</code>, Chrome ignores the window constraints and renders the full height of the document. <h3>The Implementation</h3> First, we define our data structures to match the Chrome DevTools schema. Note the <code>#[serde(rename_all = "camelCase")]</code> attribute. This is vital because Rust's <code>snake_case</code> will be rejected by Chrome's API. <pre style="background-color:#282a36;"> use std::error::Error; use base64::{Engine, prelude::BASE64_STANDARD}; use serde::{Deserialize, Serialize}; use thirtyfour::extensions::cdp::ChromeDevTools; #[derive(Debug, Default, Serialize, Deserialize)] #[serde(rename_all = "camelCase")] pub struct ScreenshotParams { #[serde(skip_serializing_if = "Option::is_none")] pub format: Option<String>, #[serde(skip_serializing_if = "Option::is_none")] pub quality: Option<u8>, #[serde(skip_serializing_if = "Option::is_none")] pub clip: Option<Viewport>, #[serde(skip_serializing_if = "Option::is_none")] pub from_surface: Option<bool>, #[serde(skip_serializing_if = "Option::is_none")] pub capture_beyond_viewport: Option<bool>, #[serde(skip_serializing_if = "Option::is_none")] pub optimize_for_speed: Option<bool>, } #[derive(Debug, Default, Serialize, Deserialize)] pub struct Viewport { pub x: u32, pub y: u32, pub width: u32, pub height: u32, pub scale: u32, } pub const FULL_SIZE_SCREENSHOT: ScreenshotParams = ScreenshotParams { capture_beyond_viewport: Some(true), from_surface: Some(true), clip: None, format: None, optimize_for_speed: None, quality: None, }; </pre> <h3>Executing the Command</h3> Once our structs are ready, we execute the command. Chrome returns the image as a Base64 encoded string, so we need to decode that into a raw byte vector (<code>Vec<u8></code>) so we can save it to disk or process it. <pre style="background-color:#282a36;"> pub async fn screenshot(devtools: &ChromeDevTools) -> Result<Vec<u8>, Box<dyn Error>> { // 1. Serialize our parameters to JSON let params = serde_json::to_value(&FULL_SIZE_SCREENSHOT).unwrap(); // 2. Call the CDP method let response = devtools .execute_cdp_with_params("Page.captureScreenshot", params) .await?; // 3. Extract the Base64 data from the response let base_64_png = response.get("data") .and_then(|d| d.as_str()) .ok_or("Failed to find image data in response")?; // 4. Decode it into raw PNG bytes let png = BASE64_STANDARD.decode(base_64_png)?; Ok(png) } </pre> <h3>How to use it in your project</h3> Integrating this into your <code>thirtyfour</code> workflow is straightforward. Simply wrap your driver handle in a <code>ChromeDevTools</code> instance: <pre style="background-color:#282a36;"> // ... setup your thirtyfour WebDriver ... let devtools = ChromeDevTools::new(driver.handle()); // Navigate to a long page driver.goto("[https://www.rust-lang.org](https://www.rust-lang.org)").await?; // Capture everything! let image_bytes = screenshot(&devtools).await?; std::fs::write("rust_homepage.png", image_bytes)?; </pre> <h2>Summary</h2> By reaching past the standard WebDriver API and using CDP, we gain much finer control over how Chrome behaves. This approach is perfect for: <ul> <li>Visual regression testing.</li> <li>Generating website previews.</li> <li>Archiving landing pages.</li> </ul> Just a heads-up: capturing extremely long pages (like a social media feed) can result in massive PNG files, so keep an eye on your memory usage! Happy Hacking!

Tired of partial webpage screenshots? Discover how to capture entire, full-size web pages from header to footer using Rust and the powerful Chrome DevTools Protocol (CDP). This in-depth guide, leveraging the `thirtyfour` crate, reveals the secret to bypassing standard WebDriver viewport limitations with `Page.captureScreenshot`'s `captureBeyondViewport` parameter, making it perfect for visual regression testing, generating complete website previews, or archiving web content without missing a single pixel.

Smarter Filters: Empowering Users with AI-Driven Search 2025-08-15T13:00:00Z https://bandarra.me/posts/ai-smart-filters <img src="/images/SmartFilters.png" alt="" /> Over a decade ago, I worked on a travel meta-search website. We discovered through UX research that only the most savvy users could effectively utilize the myriad filter options available for flight results. Acknowledging this, we dedicated significant time to optimizing the user experience in that area, but never found a solution that really made filters significantly easier to use. Now, imagine if instead of having to figure out how the filters work and select the ones that reflect what they are looking for, users could simply express what they want using their own language, maybe even using their voice as the input. It turns out that, with AI's significant development in the last few years, this is possible today, and there are already sites out there implementing this pattern, like Kayak's Smart Filter feature, or Redbus's Eazzy filter. <div style="display: flex; flex-direction: row; gap: 4px; font-size: 16px; justify-content: space-between;"> <div style="display: flex; flex-direction: column;align-items: center;max-width:50%"> <a href="/images/KayakSmartFilters.png"> <img style="margin-bottom: 0; height: 250px;" src="/images/KayakSmartFilters.png"/></a> <div>Kayak's Smart Filter</div> </div> <div style="display: flex; flex-direction: column;align-items: center;max-width:50%"> <a href="/images/RedbusEazzyFilter.png"> <img style="margin-bottom: 0; height: 250px;" src="/images/RedbusEazzyFilter.png"/></a> <div>Redbus' Eazzy Filter</div> </div> </div> And even better, with the help of the <a href="https://developer.chrome.com/docs/ai/built-in-apis">Built-in AI APIs</a> the entire process can run on the user's device, with zero cost, without the user's voice or text prompt ever leaving the user's device, and even works offline! Here's a demo application implementing a smart filtering experience that runs on the client-side, and with voice input: <iframe width="800" height="450" style="width:100%;" src="https://www.youtube.com/embed/Vldmo2DFoqc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> You can also try out the <a href="https://bandarra.me/apps/flyby/">live demo</a>. <blockquote class="markdown-alert-important"> To try out the demo above, you will need to enable the Prompt API for Gemini Nano by pointing tour browser to <code>chrome://flags/#prompt-api-for-gemini-nano</code> and setting the flag to <code>Enabled</code>. To enable the audio input you will also need to set the <code>chrome://flags/#prompt-api-for-gemini-nano-multimodal-input</code> flag to <code>Enabled</code>. </blockquote> But how does this work!? The core functionality of transforming the user's input in natural language into filter settings uses generative AI in the format of a Large Language Model (LLM), with a feature called structured output, which helps the model generate output in an specific format. The audio input feature also uses a multimodal LLM to transcribe the user's voice into text that is then passed to the core functionality for processing. Let's take a deeper look into how those work. <h2>Transforming natural language input into structured filter configuration</h2> As described in the previous section, the solution utilizes an LLM to build the the core functionality, which transforms the user's query in natural language into a structured filter configuration use. More specifically, this implementation uses the <a href="https://developer.chrome.com/docs/ai/prompt-api">Built-in Prompt API</a>, which runs on top of a browser provided state of the art LLM, Gemini Nano in Chrome's case. Using an LLM via the Prompt API has the advantage that the model is managed by the browser and, because once it's downloaded the first time it's available to any sites, it may be immediately available on the user machine, avoiding hefty downloads. <blockquote class="markdown-alert-note"> While the Built-in Prompt API is generally available on Chrome Extensions, it's currently only available a Chrome Origin Trial for the web on MacOS, Windows and Linux, and Chrome is currently the only browser that provides the API. This means that features that lean on this API should either be Progressive Enhancement or use a hybrid solution, like the <a href="https://developer.chrome.com/docs/ai/firebase-ai-logic">Firebase AI Logic</a>. </blockquote> It's possible to break the process to transform the user's input into a filter configuration in three components: <ul> <li>A structured output schema which describes the format the model should use to output information, as well as constraints to the output and field descriptions.</li> <li>A system prompt describing what the model's goal is, a set of rules and examples to help the model understand how to process the information and, finally, any additional information it may need.</li> <li>Handling the user's query, that is, taking the user input and prompting the large language model to extract the information and return it in the required format.</li> </ul> <h3>Definining the structured output</h3> The structured output is the glue between the output of the LLM and the filters in the application. The application includes a method where a set of filters can be applied by passing an object with the filter definitions to a method: <pre style="background-color:#282a36;"> interface FilterState { minPrice: number; maxPrice: number; departureAirports: string[]; arrivalAirports: string[]; stops: number[]; airlines: string[]; } const handleSmartFilterChange = (newFilters: FilterState) => { // Filter results providing the state and update the UI. } </pre> The <a href="https://json-schema.org/">JSON Schema</a> is used to describe the target output for the LLM. In this example, it should describe the <code>FilterState</code> object above, which is the input to <code>handleSmartFilterChange()</code>. This example from <a href="https://developer.chrome.com/docs/ai/structured-output-for-prompt-api">the Chrome documentation for using Structured Ouput with the Prompt API</a> explains the steps needed: <pre style="background-color:#282a36;"> const session = await LanguageModel.create(); const schema = { "type": "boolean" }; const post = "Mugs and ramen bowls, both a bit smaller than intended- but that's how it goes with reclaim. Glaze crawled the first time around, but pretty happy with it after refiring."; const result = await session.prompt( `Is this post about pottery?\n\n${post}`, { responseConstraint: schema, } ); console.log(JSON.parse(result)); </pre> <blockquote class="markdown-alert-tip"> The <code>responseConstraint</code> is defined with a JSON Schema, which is a quite powerful format. Make sure to check the documentation for it at <a href="https://json-schema.org/">jsonschema.org</a>. </blockquote> The schema for the filter object is much larger than this example, and you can read <a href="https://github.com/andreban/flyby-results-explorer/blob/main/src/lib/ai.ts#L171-L229">the whole definition on the project's repository</a>. The following are some tips and best practices identified while writing the schema for this demo application: <ul> <li>Required fields: in the initial version, the choice was to hide fields when no information about them was available in the user's query, so there were no required fields. Through experimentation, it was identified that the results from the LLM were most consistent when requiring those fields, with default values provided for when they didn't exist in the user's query. For numeric fields, a value of <code>-1</code> was used as the default value, and the code for handling filters was adapted to handle that.</li> <li>Field descriptions: field descriptions help the LLM understand the context of each field and add the correct information into them. It's generally better to provide this information in the schema itself, rather then in the prompt engineering.</li> <li>Regex fields: the airline are 2 letter strings and airport fields are 3 letter strings. Without providing the pattern description, the LLM would eventually return the airline or airport full names, rather than the 2 or 3 letter codes.</li> </ul> <h3>Engineering the system prompt</h3> If the structured output defines what the model output should look like, the system prompt defines how the model should interpret the user's query. This is the step where most of time was spent optimizing, to maximize the cases where the model would interpret the results correctly. In the Prompt API, the system prompt can be passed to the model when creating a new instance of <code>LanguageModel</code>: <pre style="background-color:#282a36;"> const systemPrompt = '...'; const session = await LanguageModel.create({ initialPrompts: [{ role: "system", content: systemPrompt, }] }); </pre> <blockquote class="markdown-alert-tip"> The system prompt can also be passed as a parameter when calling <code>prompt()</code> or <code>promptStreaming()</code>. However, using it in the session makes it easier and results in more performance when prompting with the same system prompt. </blockquote> Given prompt engineering is an area where a lot of the time building the application is spent, it's worth using a tool to track progress and regression across different changes and iterations. For this application, <a href="https://bandarra.me/apps/structured-output-eval/">a tool to test user queries against prompts and model configuration</a> was created, specifically for the Prompt API and structured output. You can see the output of the the in the screenshot below: <img src="/images/structured-output-eval.png" alt="A screenshot of the structured output evaluation tool" /> The following notes and recommendations were derived from the prompt engineering work on this application: <ol> <li>The initial implementation focused on providing a set of rules for the model to follow in the system prompt. But when adding example inputs and outputs (multi-shot prompting), the accuracy of the output increased significantly, going from <code>~56%</code> to <code>~90%</code>. At that point, it was possible to completely remove the rules and focus on examples, without a penalty in the model's accuracy, resulting in a short and simple base prompt:</li> </ol> <pre style="background-color:#282a36;"> You are a helpful assistant that generates structured data for flight search filters. </pre> While the base prompt was short, the system prompt includes a total of 12 examples that help the LLM understand different user queries and expected results. Note that the examples output include all the required fields, which helps the LLM understand when the default values should be used. <pre style="background-color:#282a36;"> <example> <query> Flights under $800 </query> <output> { "minPrice": -1, "maxPrice": 500, "nonstop": false, "onestop": false, "twostop": false, "departureAirports": [], "arrivalAirports": [], "airlines": [] } </output> </example> </pre> <ol start="2"> <li>The model would eventually return the wrong code for airlines. Including a list of available airlines and codes into the system made it much more accurate to return airline codes. The following is how the list of airlines were included into the system prompt.</li> </ol> <pre style="background-color:#282a36;"> This is a list of airlines and codes available to filter: [ { code: "UA", name: "United Airlines" }, { code: "DL", name: "Delta Air Lines" }, { code: "AA", name: "American Airlines" }, { code: "WN", name: "Southwest Airlines" }, { code: "B6", name: "JetBlue Airways" }, { code: "NK", name: "Spirit Airlines" }, { code: "AS", name: "Alaska Airlines" }, { code: "F9", name: "Frontier Airlines" }, { code: "QF", name: "Quantas Airlines" }, ] </pre> While the same solution could be used for airport codes and names, the model's has been accurate in extracting airport codes from user queries, so the same approach wasn't necessary. <h3>Transform user queries into filter configuration</h3> With all the pieces in place, it now becomes possible to add the code that glues the system prompt, schema and the user query to transform the user's input into a filter configuration. It's important to note that the configuration includes a value of <code>0.5</code> for the temperature and <code>1</code> for the top-K, which significantly reduces the randomness of the model and causes it to return more consistent results. <blockquote class="markdown-alert-tip"> Read <a href="/posts/understand-temperature-topk">Understand the Effects of Temperature on Large Language Model Output</a> for considerations on the impact of changing temperature and top-K values on an LLM output. </blockquote> <pre style="background-color:#282a36;"> const systemPrompt = "..."; // content elided for brevity. const schema = {...}; // content elided for brevity. // Create the model session. const session = await LanguageModel.create({ temperature: 0.5, topK: 1, initialPrompts: [{ role: "system", content: systemPrompt, }] }); // Execute the user's on the session, passing the structured output schema as a parameter. const result = await session.prompt(query, { responseConstraint: schema, }); const filterState = JSON.parse(result); </pre> With the AI generated result ready, all that is left to apply the AI generated filter configuration to the application is invoking the code that handles filtering: <pre style="background-color:#282a36;"> handleSmartFilterChange(filterState) </pre> <h2>Handling voice input with multimodal</h2> Multimodal models are capable of handing image, audio and sometimes video in their input or output. One potential use-case for those models is to transcribe the user's voice input into text, that can then be plugged in into other parts of the application. The implementation with the Prompt API is similar to before, with the key differences being that the API is told to expect audio inputs, so it can ensure the right models that support this modality are created, and that the audio blob is handed over to the model as part of the prompt call. <pre style="background-color:#282a36;"> async function getTranscriptionFromAudio(audioBlob) { const session = await LanguageModel.create({ expectedInputs: [{ type: 'audio' }], }); const result = await session.prompt([{ role: 'user', content: [ { type: 'text', value: 'Transcribe this audio' }, { type: 'audio', value: audioBlob }, ] }]); } </pre> <blockquote class="markdown-alert-important"> The Multimodal functionality for the Prompt API is behind a different flag. To try out the filter, make sure to enable the flags mentioned previously in this article and additionally set <code>chrome://flags/#prompt-api-for-gemini-nano-multimodal-input</code> to <code>Enabled</code>. </blockquote> Finally, recording the user's voice input can be implemented via the MediaRecorder: <pre style="background-color:#282a36;"> const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); const mediaRecorder = new MediaRecorder(stream); mediaRecorderRef.current = mediaRecorder; audioChunksRef.current = []; mediaRecorder.ondataavailable = (event) => { audioChunksRef.current.push(event.data); }; mediaRecorder.onstop = async () => { const audioBlob = new Blob(audioChunksRef.current, { type: "audio/webm" }); try { const transcription = await getTranscriptionFromAudio(audioBlob); setQuery(transcription); await handleFilter(transcription); } catch (error) { console.error("Error getting transcription:", error); } }; mediaRecorder.start(); </pre> <h2>Conclusion</h2> This article demonstrated how a developer can use generative AI to build a better filtering experience, using the Built-in Prompt. Result filters have been an advanced user feature for a long time, and not only for the travel vertical, where it seems to be getting more traction, but across e-commerces, auction sites, or any other user interface where users need to apply filters to get to the result they want. Generative AI can streamline this user journey, allowing user to explain the results they want in their own language, instead of having to figure out how to use the controls provided by the site, allowing them to get to reach their goals when visiting your site faster.

Discover how AI is revolutionizing search filters! Learn how to transform natural language queries into structured filters using Chrome's Built-in Prompt API and Gemini Nano. Explore techniques like structured output schemas and prompt engineering to create intuitive, voice-enabled filtering experiences that run offline, enhancing user experience on travel and e-commerce sites. Try the live demo!

Building with Lovable: A Low-Code Experiment 2025-07-14T10:00:00Z https://bandarra.me/posts/building-with-lovable <img src="https://bandarra.me/images/LovableLove.png" alt="A vibe code writing code" /> My buddy <a href="https://www.linkedin.com/in/thiagocarneiro/">Thiago Carneiro</a> showed me some projects he's been building with <a href="https://lovable.dev/">Lovable</a>, like this <a href="https://texturewiz.com/">Texture Wizard</a> or this <a href="https://ytpreview.thiagocarneiro.com/">YouTube Preview tool</a>. But the really really cool thing about this is what Thiago told me - his background is as Designer and Game Artist and, while he does understand well what goes behind the scenes, he says he wouldn't have the know-how to build those applications himself, and those tools unlocked the possibility for him to build a number of tools that make his life easier - and share with others. It's amazing to see how those tools are unlocking the potential for more people to build. As often is the case, there are limitations to those tools. Integrating with a 3rd party API might be hard, or even implementing a sign-in system or payments, and that might require someone - maybe the vibe coder or another skilled developer, to dive deeper into the code the AI generated Now, a common concern from developers around low-code tools like Lovable is the code quality, and how easy it is to maintain the code, since it can be an AI generated code base with little to no human supervision. Thiago was kind enough to share the source code of his tools with me, and checking out the code for his various applications was interesting. What I found were applications that are quite consistent across each other on the tech stack, which uses <a href="https://vite.dev/">Vite</a>, <a href="https://react.dev/">React</a> and <a href="https://tailwindcss.com/">Tailwind</a>, with a directory structure and patterns that are consistent across applications, and programming patterns that are componentized, clean, and easy to read. In short, getting up to speed with those projects wasn't only easy, but because they are similar, moving o the next one got easier. Over this week, I made an experiment - I bootstrapped a demo application with Lovable (stay tuned for more), then moved the application o VS Code and continued developing it (with the help of Cline). Because Lovable has an integration with <a href="https://github.com/">GitHub</a>, saving the project and running it locally was straightforward, and building up the application from that point, given how well organized the project was was easy. I'm excited by how tools like <a href="https://lovable.dev/">Lovable</a>, <a href="https://bolt.new/">Bolt</a>, and others are enabling non-coders to build the tools and applications they need on the web, creating a new wave of excitement on the platform. After looking at the code generated by Lovable, I'm also confident that there's an "upgrade path" for when the limits of those tools are reached, and growing beyond that requires hands-on coding.

Discover how Lovable, a low-code tool, empowers designers like Thiago Carneiro to build impressive applications like Texture Wizard and YouTube Preview tool, despite limited coding experience. This review explores the code quality of Lovable-generated projects, revealing a consistent tech stack (Vite, React, Tailwind) and clean, componentized code. See how easily you can extend Lovable projects with tools like VS Code and GitHub, as demonstrated by the Flyby demo, which experiments with natural language filtering using Chrome's Built-in Prompt API. Try the Flyby demo and see for yourself.

AI-Generated Code: Ownership and Developer Responsibility 2025-07-07T16:37:00Z https://bandarra.me/posts/ai-generated-code-ownership A few weeks ago I was discussing recommendations for organizations adopting AI developer tooling with a friend and, one of the points we agreed with, is that developers should treat code generated by AI as their own, and thoroughly review it themselves before submitting for review by their wider team. Later in that week, I learned about the <a href="https://arxiv.org/abs/2506.08872">Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task</a> study, which triggered loads of discussions online. One of the takeaways from the study was the potential impact on the feeling of ownership of the work produced: <blockquote> This trade-off highlights an important educational concern: AI tools, while valuable for supporting performance, may unintentionally hinder deep cognitive processing, retention, and authentic engagement with written material. If users rely heavily on AI tools, they may achieve superficial fluency but fail to internalize the knowledge or feel a sense of ownership over it. </blockquote> This made me question the idea that developers should have ownership of the AI generated code. While this is, indeed, the best scenario, it may just not be in line with human nature. But if developers are unable to feel ownership of the code generated by AI, what do they feel ownership of? <a href="https://x.com/karpathy">Andrej Karpathy</a> hinted on what that is <a href="https://x.com/karpathy/status/1617979122625712128">in one of his tweets</a>: <blockquote> “The hottest new programming language is English” </blockquote> What developers can feel ownership of are the prompts given to an AI to generate code, but does this make AI a programming language? Not really, and at least now how I’d like it to be. Maybe, for prompts to be considered a programming language, they should work like a higher level language. That is, similar to how a compiler transforms C++ code into machine code, I’d expect AI to transform the prompts into C++ code (or any other language). But this fails in a couple ways. The same set of prompts would need to generate the same output. If that was the case, developers would be able to commit the sequence of prompts, which they feel ownership of, to their GitHub repository and reproduce the entire application from them. But that’s not how AI works, the same set of prompts can produce widely different results. Additionally, looking at the compiler analogy, when the C++ code is correct and the compiler produces the incorrect output, this is not a developer issue, but a compiler issue. A compiler that doesn’t always produce the correct, or even the same output is considered to be broken. But, with AI, the solution is to go back and tweak the prompt until it works. Another question is that, if code becomes English, it's likely that submitting the full specification rather than the sequence of prompts that led to that specification is easier for the AI to reproduce, and for humans to maintain (even if maintenance is assisted by AI). Maybe, over time, coding AI systems will get better at correctness and reproducibility, and developers will be able to build their work on top of what they will feel ownership of, the prompts. In the meantime, while developers may not feel the same level of ownership over AI generated code, learning how to effectively review it is becoming an important part of the developer’s skillset.

Explore the evolving role of AI in software development and the critical question of code ownership. Is AI-generated code truly "owned" by developers, or is the focus shifting to prompt engineering? This article delves into the challenges of reproducibility, the need for rigorous review, and how AI's impact on coding skills is reshaping the developer landscape.

From PyTorch to Browser: a full client-side solution with ONNX and Transformers.js 2025-05-06T13:16:00Z https://bandarra.me/posts/from-pytorch-to-browser-a-full-client-side-solution-with-onnx-and-transformers-js In the <a href="https://bandarra.me/posts/from-pytorch-to-browser-creating-a-web-friendly-ai-model">previous article</a>, I wrote about using an <a href="https://huggingface.co/tasks/feature-extraction">feature extraction model</a> to generate embeddings from text, then train a custom classification model for sentiment analysis, using the embeddings as the input for the model, and finally <a href="https://ai.google.dev/edge/litert/models/pytorch_to_tflite">exported the model to run in the browser with LiteRT</a> and <a href="https://www.npmjs.com/package/@tensorflow/tfjs-tflite">Tensorflow Lite</a>. While the classification model in the previous solution runs on the client side, the solution uses <a href="https://ai.google.dev/gemini-api/docs/embeddings">Google AI's embedding API</a> to generate text embeds - a Cloud API, so the solution is not entirely client-side. In this article, I'll explore a full client-side solution for toxicity detection using <a href="https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/">Kaggle's Toxic Comment Classification Challenge</a> dataset and <a href="https://huggingface.co/docs/transformers.js/en/api/pipelines#module_pipelines.FeatureExtractionPipeline">Transformers.js's feature extraction pipeline</a>, and running it in the browser with the <a href="https://www.npmjs.com/package/onnxruntime-web">ONNX web runtime</a>. <h2>Choose the tools and libraries</h2> PyTorch was the ML framework used in the previous article, and there's no reason to choose a different approach. Since the goal is to enable a full client-side solution, the embedding model needs to run on both the training pipeline and in the browser, for inference. The <code>all-MiniLM-L6-v2</code> model is <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2">supported in Python with Sentence Transformers</a> and <a href="https://huggingface.co/Xenova/all-MiniLM-L6-v2">in the browser with Transformers.js</a>, making it a great choice. <a href="https://huggingface.co/docs/transformers.js/en/index">Transformers.js</a> is a great library for running off-the-shelf AI models in the browser. Because Transformers.js uses <a href="https://onnxruntime.ai/">ONNX Runtime</a> as the underlying AI library to run models, it makes <a href="https://www.npmjs.com/package/onnxruntime-web">ONNX Runtime web</a> a great to choice to run the custom model in the browser, creating synergy between Transformers.js and the custom model, and avoiding increasing the number of dependencies on the web application. <h2>Data pre processing</h2> The data pre processing step consists of transforming the original dataset containing text comments and labels into a new dataset containing the embeddings generated by the feature extraction model and labels: <pre style="background-color:#282a36;"> from sentence_transformers import SentenceTransformer model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') with open("data/train.csv", "r", encoding="utf-8") as dataset_file: dataset_csv = csv.DictReader(dataset_file) with open(output_file, "a") as output_file: for entry in dataset_csv: embeddings = model.encode([entry['comment_text']]) result = { 'id': entry['id'], 'embeddings': embeddings[0].tolist(), 'toxic': entry['toxic'], 'severe_toxic': entry['severe_toxic'], 'obscene': entry['obscene'], 'threat': entry['threat'], 'insult': entry['insult'], 'identity_hate': entry['identity_hate'], } json_result = json.dumps(result) output_file.write(json_result + "\n") output_file.flush() </pre> <h2>Model architecture and training</h2> The model architecture is similar to the one used on the previous article. In this case, the <code>all-MiniLM-L6-v2</code> feature extraction model generates embeddings as an array of 384 float values, so the model needs to be changed to reflect that. A normalization layer was also introduced in each layer of the model, as that has demonstrated to slightly improve the performance on the validation set, as well as reducing the number of epochs required for the model to converge. <pre style="background-color:#282a36;"> class ToxicityModel(nn.Module): def __init__(self): super().__init__() self.linear0 = nn.Linear(384, 128) self.norm0 = nn.BatchNorm1d(128) self.linear1 = nn.Linear(128, 32) self.norm1 = nn.BatchNorm1d(32) self.linear_out = nn.Linear(32, 6) def forward(self, x): x = self.linear0(x) x = self.norm0(x) x = F.relu(x) x = self.linear1(x) x = self.norm1(x) x = F.relu(x) x = self.linear_out(x) return x </pre> Binary Cross Entropy (BCE) is used as the loss function, through <a href="https://docs.pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html">BCEWithLogitsLoss</a>, which also combines the output with a <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid function</a>, which allows comparing the results from the model with the labels from the training set. An accuracy of 98% is achieved with this model on the validation set. <h2>Model conversion from PyTorch to ONNX</h2> <a href="https://pytorch.org/tutorials/beginner/onnx/export_simple_model_to_onnx_tutorial.html">Converting from the PyTorch format to ONNX</a> requires the installation of the <code>onnx</code> and <code>onnxscript</code> dependencies, with a straightforward implementation. <pre style="background-color:#282a36;"> import torch from all_minilm_l6.toxicity_model import ToxicityModel torch_model = ToxicityModel() torch_model.load_state_dict(torch.load( "SP-all-MiniLM-L6-v2.safetensors", weights_only=True, map_location=torch.device('cpu') )) torch_model.eval() example_inputs = (torch.randn(1, 384),) onnx_program = torch.onnx.export(torch_model, example_inputs, dynamo=True) onnx_program.optimize() onnx_program.save("SP-all-MiniLM-L6-v2.onnx") </pre> Similar to the LiteRT conversion, the model requires passing an example input when being converted, that can be randomly generated. <h2>Running the ONNX model in the browser</h2> Running ONNX models in the browser is achieved with the <a href="https://www.npmjs.com/package/onnxruntime-web"><code>onnxruntime-web</code> library</a>. Because the model takes embeddings as input, generated with the <code>all-MiniLM-L6-v2</code>, <a href="https://huggingface.co/docs/transformers.js/en/index">Transformers.js</a> is required for the pre processing step, to transform the user input into embeddings. Finally, the model output logits, which can be transformed into probabilities with a <a href="https://en.wikipedia.org/wiki/Sigmoid_function">sigmoid function</a>. ONNX Runtime doesn't provide the function out of the box, but the implementation is a one line function: <pre style="background-color:#282a36;"> function sigmoid(xs) { return xs.map(x=> 1 / (1 + Math.exp(-x))) } </pre> Finally putting everything together becomes a matter of importing the required libraries, transforming the user's input into embeddings, calling the custom model with those embeddings, and then applying the sigmoid function to the model results, generating a probability for each toxicity type: <pre style="background-color:#282a36;"> import { pipeline } from '@huggingface/transformers'; import * as ort from 'onnxruntime-web'; // Create the Transformers.js pipeline using the all-MiniLM-L6-v2 feature // extractionmodel. const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2'); // Instantiate the custom ONNX runtime model. const model = await ort.InferenceSession.create('SP-all-MiniLM-L6-v2.onnx'); const sentences = ["This is an example input"]; // Generate embeddings from the user input. const output = await extractor(sentences, { pooling: 'mean', normalize: true }); // Classify the embeddings. const outputTensor = await model.run({x: output}); // Transform the embeddings into probabilities using the sigmoid function. const probabilities = sigmoid(outputTensor.linear_2.data); </pre> <h2>Conclusion</h2> The combination of a feature extraction model with a classification pattern looks like a promising pattern. With the model hitting 98% accuracy, and the size of both models together under 5mb, and inference that is almost instant, this looks like a good use case to run on the client side. As a next step, make sure to <a href="https://bandarra.me/apps/toxicity-detection-onnx/">check the model in action</a> or <a href="https://github.com/andreban/jigsaw-toxic-comment-classification-challenge">take a look at source code for the model and web application</a>

Build a full client-side toxicity detection solution using Transformers.js, ONNX Runtime, and the `all-MiniLM-L6-v2` model. Train a custom model on the Kaggle Toxic Comment Classification Challenge dataset, convert it to ONNX, and run it in the browser for fast, private text analysis. Get 98% accuracy with this approach, see the code, and try the live demo.

From PyTorch to Browser: Creating a Web-Friendly AI Model 2025-04-16T12:23:00Z https://bandarra.me/posts/from-pytorch-to-browser-creating-a-web-friendly-ai-model <h1>Motivation</h1> There's an immense availability of AI models for a wide range of use cases, readily available to web developers via tools like <a href="https://ai.google.dev/edge/mediapipe/solutions/guide">Mediapipe</a> and <a href="https://huggingface.co/docs/transformers.js/en/index">Transformers.js</a>, and more recently via the <a href="https://huggingface.co/docs/transformers.js/en/index">Built-in AI APIs</a>, with friendly APIs. Those tools provide models for a wide range of common tasks, ranging from text texts like text classification or language detection, vision tasks like image segmentation, and even large language models. However, developers will sometimes have specific needs that are not covered by readily available models, or those models might not be available in a web friendly format. In this article, I explore building a model from scratch using <a href="https://pytorch.org/">PyTorch</a>, and exporting it to a browser friendly format that is compatible with Google's LiteRT library. Disclaimer: I'm not a Python developer or an AI engineer. This is just an exercise to understand the process of building and deploying a model, end to end. <h1>Picking a problem and outlining a solution</h1> Ideally, such an experiment should happen on a problem that is at least adjacent to the real world. The inspiration for this one comes from a colleague trying to understand sentiment of messages from a mailing list - whether the messages were positive, negative or neutral. This is also not too far away from another problem I heard from a developer, where they had their own specific, somewhat more lenient rules, for toxicity detection. The <a href="https://ai.google.dev/gemini-api/docs/embeddings">Google AI Embeddings API</a> has an option that optimizes embeddings for classification, and this looked like a good opportunity to experiment with building a classification model on top of the embeddings generated by that API. The last thing needed to get started is a dataset, as finding good datasets is crucial for building good models. Fortunately, dataset hubs like <a href="https://www.kaggle.com/datasets">Kaggle</a> or <a href="https://huggingface.co/datasets">HuggingFace</a> for various datasets we can use and, for this particular problem, I chose this <a href="https://www.kaggle.com/datasets/atifaliak/youtube-comments-dataset">Kaggle dataset for sentiment analysis on YouTube comments</a>. It contains <code>17872</code> comments, each one classified as <code>positive</code>, <code>negative</code>, or <code>neutral</code>. <h1>Preparing the dataset</h1> With the dataset selected and downloaded, the next step is transforming it into a format that can be used by our model. Most important, in this case, is transforming the comments from the dataset into the embeddings we are going to use as the the input for the model. As this is a time consuming process, the solution is to pre-process the data and save the embeddings into a separate file, using the <a href="https://ai.google.dev/gemini-api/docs/libraries">Google GenAI Python</a> library: <pre style="background-color:#282a36;"> client = genai.Client(api_key="YOUR API KEY") with open("YoutubeCommentsDataSet.csv", "r", encoding="utf-8") as csvfile: csvreader = csv.DictReader(csvfile) with open("YouTubeCommentsEmbeddings.jsonl", "a") as embeddingsfile: for row in csvreader: result = client.models.embed_content( model='text-embedding-004', contents=row['Comment'], config=types.EmbedContentConfig(task_type="CLASSIFICATION") ) result = { "embeddings": result.embeddings[0].values, "sentiment": row['Sentiment'], } json_result = json.dumps(result) embeddingsfile.write(json_result + "\n") embeddingsfile.flush() </pre> The code above loads all comments from <code>YoutubeCommentsDataset.csv</code>, transforms the comments into embeddings using the Google AI SDK, and saves the embeddings and the sentiment into a new file, <code>YouTubeCommentsEmbeddings.jsonl</code>. <h1>Training the model</h1> <h2>Loading the previously generated data.</h2> Before training the model, the file created in the previous set must be loaded. While doing that, the <code>positive</code>, <code>neutral</code> and <code>negative</code> sentiment values are also mapped to <code>0</code>, <code>1</code>, and <code>2</code>. <pre style="background-color:#282a36;"> sentiments_dict = {'positive': 0, 'neutral': 1, 'negative': 2} dataset = [] with open('YouTubeCommentsEmbeddings.jsonl', 'r') as f: for line in f: data = json.loads(line) embeddings = torch.tensor(data['embeddings']) sentiment = torch.tensor(sentiments_dict[data['sentiment']]) dataset.append((embeddings, sentiment)) </pre> <h2>Splitting into training and validation datasets</h2> With the initial dataset loaded, a good practice is to split the dataset into a training dataset and a validation dataset. While the first is used to train the model, the second is used to check the model accuracy, and that it's not memorizing the training dataset instead, which would lead to poor performance in the real world. <pre style="background-color:#282a36;"> num_samples = len(dataset) num_validation = int(0.2 * num_samples) shuffled_indices = torch.randperm(num_samples) train_indices = shuffled_indices[:-num_validation] validation_indices = shuffled_indices[-num_validation:] training_dataset = [dataset[i] for i in train_indices] validation_dataset = [dataset[i] for i in validation_indices] train_loader = torch.utils.data.DataLoader(training_dataset, batch_size=64, shuffle=True) validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=64, shuffle=False) </pre> Dataset loaders are also created, with a batch size of <code>64</code>. The training loader has the <code>shuffle</code> parameter set to <code>True</code>, ensuring that, each training loop, the order of the inputs is different. <h2>The training loop</h2> With the training and validation sets ready, it's now time to train the model. The training loop is fairly standard for training neural networks: the output is calculated by invoking the model with <code>model(x_train)</code>, the loss is calculated from the predicted values and the expected results with <code>loss_fn(y_predicted, y_train)</code>. After any remaining gradients are cleared with <code>optimizer.zero_grad()</code>, new gradients are generated with <code>loss.backwards()</code>, and then model weights are updated with <code>optimizer.step()</code>. <pre style="background-color:#282a36;"> def training_loop(n_epochs, model, optimizer, loss_fn, training_loader, validation_loader): for epoch in range(1, n_epochs + 1): for x_train, y_train in training_loader: y_predicted = model(x_train) loss = loss_fn(y_predicted, y_train) optimizer.zero_grad() loss.backward() optimizer.step() # Disable grad for calculating validation metrics, since backpropagation # is not needed and this should improve performance. with torch.no_grad(): correct = 0 total = 0 for x_val, y_val in validation_loader: outputs = model(x_val) _, predicted = torch.max(outputs, dim=-1) correct += int((predicted == y_val).sum()) total += x_val.shape[0] print('Epoch: %d, Loss: %f, Accuracy: %f' % (epoch, float(loss), correct / total)) </pre> After each epoch, the training loop calculates and prints the accuracy using the validation dataset. <h2>The model, optimizer and hyper parameters</h2> The model used has an input of <code>768</code>, which is the size of the embeddings array created by the Google AI embeddings API, and an output <code>3</code>, one for each possible value. The hidden layer size is <code>512</code>. <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic Gradient Descent (SGD)</a> is used as the optimizer, and <a href="https://en.wikipedia.org/wiki/Cross-entropy">Cross Entropy as the loss function</a>. <pre style="background-color:#282a36;"> seq_model = nn.Sequential(OrderedDict([ ('hidden_linear_0', nn.Linear(768, 512)), ('hidden_activation_0', nn.ReLU()), ('output_linear', nn.Linear(512, 3)) ])) optimizer = optim.SGD(seq_model.parameters(), lr=1e-2) training_loop( n_epochs = 200, model = seq_model, optimizer = optimizer, loss_fn = nn.CrossEntropyLoss(), training_loader = train_loader, validation_loader = validation_loader, ) </pre> <h2>Training results</h2> The model needs less than <code>200</code> epochs (or training loops) for the accuracy on the validation set to stabilize around <code>83%</code>! While there's probably a lot of space for improvements, this seems to be on the top end of the existing notebooks for the model, shared on Kaggle, which range between <code>65%</code> and <code>85%</code>. Once the training is finished, the weights can be easily saved with <code>torch.save()</code>: <pre style="background-color:#282a36;"> torch.save(seq_model.state_dict(), 'ytsentiment.safetensors') </pre> <h1>Running the model on the web</h1> The model is now trained and the weights saved into <code>ytsentiment.safetensors</code>. Unfortunately, this format cannot be run directly on the web. But not everything is lost - there are formats that are web friendly, and one of the is the TFLite mode, used by <a href="https://ai.google.dev/edge/litert">LiteRT</a>. Note: <a href="https://developers.googleblog.com/en/tensorflow-lite-is-now-litert/">LiteRT is the new name for Tensorflow Lite</a>. At the time this is being writter, the LiteRT documentation doesn't mention web libraries. However, the <a href="https://www.npmjs.com/package/@tensorflow/tfjs-tflite">Tensorflow Lite</a> library is still available on NPM and can handle the TFLite format. <h2>Converting the PyTorch model to TFLite</h2> The LiteRT team <a href="https://ai.google.dev/edge/litert/models/pytorch_to_tflite">provides a library that makes the work to convert PyTorch models to TFLite straightforward</a>. The process consists on creating an instance of the same model used for training and loading the previously trained weights into it, then calling <code>ai_edge_torch.convert()</code>, passing the model and a random input as parameter so the library can understand the model better. Finally, save the model to disk with <code>edge_model.export()</code>. <pre style="background-color:#282a36;"> model = nn.Sequential(OrderedDict([ ('hidden_linear_0', nn.Linear(768, 512)), ('hidden_activation_0', nn.ReLU()), ('output_linear', nn.Linear(512, 3)) ])) model.load_state_dict(torch.load("ytsentiment.safetensors", weights_only=True)) sample_inputs = torch.randn(1, 768) edge_model = ai_edge_torch.convert(model.eval(), (torch.randn(1, 768),)) edge_model.export('ytsentiment.tflite') </pre> <h2>Running the converted model in the browser</h2> The model is now compabitle with the <a href="https://www.npmjs.com/package/@tensorflow/tfjs-tflite">Tensorflow Lite library</a>, can be loaded with <code>tflite.loadTFLiteModel()</code> and inference is executed with <code>model.predict()</code>. We we'll use Hello, your video is amazing as an example input. Because the model was trained on embeddings, rather than on text, it first needs to be converted into embeddings, using the same embedding model as before, but this time via the <a href="https://www.npmjs.com/package/@google/genai">Google Gen AI JavaScript library</a>: <pre style="background-color:#282a36;"> const genAi = new GoogleGenAI({ apiKey: 'YOUR API KEY HERE' }); const sampleInput = "Hello, your video is amazing!". const result = await genAi.models.embedContent({ model: 'text-embedding-004', contents: [sampleInput] }); const embeddings = embedResult.embeddings.map(embedding => embedding.values); </pre> The embeddings are generated with <code>models.embedContent()</code> which allows to back the generation of embeddings by providing an array of inputs to contents <code>contents</code>. The result object contains a list of embeddings results, one for each input provided as parameter. The embedding array can be accessed with <code>embedding.values</code>. The result is then mapped into a 2D array, representing the list of embeddings for each input, and then the values of the embeddings themselves. Since the model needs a tensor to run inference instead of a JavaScript array, <code>tf.tensor2d()</code>, is used to convert the 2D JavaScript array into a 2D tensor, which is then passed to the model when calling <code>model.predict()</code>: <pre style="background-color:#282a36;"> const embeddingTensor = tf.tensor2D(embeddings); const outputTensor = await model.predict(embeddingTensor); </pre> The output of the prediction is another 2D tensor, containing one element for each input, then another array with the logits, are are the score given by the model for each possible class. The tensor can be converted to a regular JavaScript array by calling <code>.array()</code>: <pre style="background-color:#282a36;"> console.log(`${await outputTensor.array()}`); // Outputs "[[4.526814937591553,0.1881929636001587,-4.69814395904541]]" </pre> As noticed in the output, the array contains only one item in the first level, matching the number of inputs passed to the model, and 3 items on the second level, which are the scores for each class - the 1st item is the score for positive, the 2nd for neutral, and the 3rd for negative, and the highest score is the most likely one, according to the model. A neat trick to convert the array of results into an array of classes to use the <code>tf.argMax()</code> function, which transforms the array of scores into an array with the index of the highest scores: <pre style="background-color:#282a36;"> const labels = ['Positive', 'Neutral', 'Negative']; const argmax = await tf.argMax(outputTensor, 1).array(); const results = argmax.map(i => labels[i]); console.log(results); // Outputs " ['Positive']" </pre> <h2>Viewing results as probabilities</h2> While the logits are enough to find the most likely result for the classification, developers may want to view those scores as probabilities, which is clearly not the case looking at the result numbers right now. This can be solved by feeding the model output into the the <code>tf.softmax()</code> function: <pre style="background-color:#282a36;"> console.log(await tf.softmax(outputTensor, 1).array()); // Outputs "[[0.9870176911354065, 0.012885026633739471, 0.00009726943972054869]]" </pre> The model gave a probability of <code>98.7%</code> that Hello, your video is amazing is a Positive comment. Seems to check out. <h2>Putting it all together</h2> This is the code for the JavaScript inference all together: <pre style="background-color:#282a36;"> const input = 'Hello, your video is amazing!'; const embedResult = await genAi.models.embedContent({ model: 'text-embedding-004', contents: [input], }); const embeddings = embedResult.embeddings.map(embedding => embedding.values); const outputTensor = await model.predict(tf.tensor2d(embeddings)); const argmax = await tf.argMax(outputTensor, 1).array(); const labels = ['Positive', 'Neutral', 'Negative']; const results = argmax.map(i => labels[i]); console.log(results); const probabilities = await tf.softmax(outputTensor, 1).array() console.log(probabilities); </pre> <h1>Conclusion</h1> Training a custom model, and doing well, requires specialized knowledge, which may not be worth for developers who want to focus on web development. At the same time, having understanding how models work and, more importantly, how to adapt them to the web can be a powerful tool for AI developers who want to get more usage of their model, or for web developers who want to take advantage of off the shelf models that are not immediately available on the web.

Learn how to build and deploy a custom sentiment analysis model for the web using PyTorch and Google's LiteRT! This guide walks you through the process of creating a model from scratch, training it on a YouTube comments dataset, converting it to a browser-friendly format, and running it in the browser with TensorFlow Lite and the Google Gen AI JavaScript library. Perfect for web developers looking to leverage custom AI models.

Understand the Effects of Temperature on Large Language Model Output 2025-03-24T12:36:00Z https://bandarra.me/posts/understand-temperature-topk When trying to understand the effect changing the temperature parameter on the output of a Large Language Model, the explanation is often that “it makes the model responses more creative” and, if you know that the model works by looping over predictions for the next token, that explanation can feel a bit underwhelming, so here’s a slightly longer one. What happens is that the model generates a score for each possible next token, which is then transformed into probabilities, and the next token is sampled from those probabilities. While the top-k parameters means that the next token will be sampled from the k top tokens by score, temperature changes the probability distribution created from those scores. While a temperature of 1.0 means the probabilities are a direct reflection of the scores, higher temperatures flatten those probabilities, increasing the chances of tokens that would be less likely to be selected, and discretion that of the most likely ones - leading the model to output less common tokens more often, making it more “creative”. Conversely, temperatures below 1.0 makes the chance of those tokens more likely to be selected even larger, making the model more predictable. To help understand how changing temperature and top-k affect probabilities, I’ve put together <a href="https://andreban.github.io/temperature-topk-visualizer/">this visualization</a>. <img src="http://bandarra.me/images/temperature-topk-visualizer.png" alt="Screenshot of the visualization application" /> The visualization shows the 10 highest score next tokens for a few different prompts and the bars show the probability of each token being selected. When changing the temperature up or down, you can observe how the probabilities are flattened out or become sharper, and how changing top-k drops some of the potential next tokens altogether. Check out the visualizer at <a href="https://andreban.github.io/temperature-topk-visualizer/">https://andreban.github.io/temperature-topk-visualizer/</a>!

Understand how temperature affects Large Language Model (LLM) output. Learn how temperature parameter changes probability distribution of next-token predictions, impacting creativity and predictability. Explore a visualization tool demonstrating the effects of temperature and top-k parameters on LLM responses.

Balancing AI Assistance and Learning 2025-03-04T15:26:00Z https://bandarra.me/posts/balancing-ai-assistance-for-learning <img src="/images/split-brain.png" alt="A brain" /> While I'm not completely oblivious about Python as I've read code in the language here and there and wrote some MicroPython, I can't say I'm a Python developer. With lot of what's happening on the AI space uses Python, I decided to brush up my Python skills and I've been going about it same way I've done multiple times in the past: get a good book on the language, and try it out on a simple project. I believe AI is a big productivity booster for developers, from autocomplete functionality that allows writing code faster to agentic experiences where you don't write code at all. However, I found that, for learning a new language, the exact same thing that boosts productivity can slow you down. The reason is that, when learning a language, practicing is a key aspect of the learning process. When writing my practice project with an AI autocomplete enabled, the autocomplete would kick in and finish the code I intented to write, which despite being correct, removed the opportunity for writing the code myself, preventing me from forgetting correct the syntax and patterns for the language and having to looking them up, or making errors and having to fix them. In general, AI prevented me from learning from my own mistakes. Maybe, in the long term, with better and better agentic experiences, that won't matter and forgetting syntax won't be as relevant. For now, I'm disabling the AI autocomplete when practicing a new language.

Learn Python for AI development: This hands-on guide details a practical approach to mastering Python, focusing on effective learning techniques and addressing the impact of AI-powered code completion tools on the learning process. Discover how to balance AI assistance with focused practice for optimal skill acquisition in Python programming for AI projects.

Building Composite Indexes for Firestore on Windows 2025-01-10T18:46:00Z https://bandarra.me/posts/create-composite-index-on-firestore-on-windows The command on the <a href="https://firebase.google.com/docs/firestore/vector-search?_gl=1*1bnvvnt*_up*MQ..*_ga*NjY3OTU3OTMuMTczNDk1NzM5OQ..*_ga_CW55HF8NVT*MTczNDk1NzM5OS4xLjAuMTczNDk1NzM5OS4wLjAuMA..#create_a_vector_index">Firestore docs</a> to create a composite index that includes a vector embedding doesn't work out of the box on Windows. This seems to be related on how escaping JSON works on Powershell. The solution I found to run the command was to create a JSON file with the index definition, like so: <pre style="background-color:#282a36;"> [ { "field-path": "user_id", "order": "ASCENDING" }, { "field-path": "embedding", "vector-config": { "dimension": 768, "flat": "{}" } } ] </pre> Then, create the index with a command that reads the index details from the file, like this: <pre style="background-color:#282a36;"> gcloud beta firestore indexes composite create ` --collection-group=MyCollection ` --query-scope=COLLECTION ` --field-config=field-config.json ` --database=mydatabase </pre>

Troubleshoot creating composite indexes with vector embeddings in Firestore on Windows. This solution uses a JSON file to define the index, bypassing Powershell JSON escaping issues, and provides the corrected `gcloud` command for successful index creation. Learn how to create a functional composite index with vector embeddings.

Count tokens with the Gemma 2 Tokenizer in Rust 2024-11-22T20:46:00Z https://bandarra.me/posts/count-tokens-with-the-gemma-2-tokenizer-in-rust For those working with Large Language Models, counting the number of tokens in an input can be a frequent task. As <a href="https://medium.com/google-cloud/a-gemini-and-gemma-tokenizer-in-java-e18831ac9677">Gemini and Gemma share the same tokenizer</a> (at least for now), it is quite useful to be able to be able to count tokens on an input locally, without making network calls <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/count-tokens">to an endpoint</a>, which can be much slower. In rust, this can be achieved with the <a href="https://crates.io/crates/tokenizers"><code>tokenizers</code></a> crate. The sample code below is a minimalistc implementation of <a href="https://github.com/Kamalabot/cratesploring/blob/main/candle_explorer/gemma-tokenizer/src/main.rs">this sample code</a>, removing the need for the <a href="https://crates.io/crates/candle-examples"><code>candle-examples</code></a> create, but still uses the <a href="https://crates.io/crates/candle-examples"><code>hf_hub</code></a> crate to manage model download, but those could be manually downloaded too. <pre style="background-color:#282a36;"> use hf_hub::{api::sync::ApiBuilder, Repo, RepoType}; use tokenizers::Tokenizer; const HF_TOKEN: &str = "YOUR_TOKEN_HERE"; const MODEL_ID: &str= "google/gemma-2-2b"; const MODEL_REVISION: &str = "main"; fn main() -> Result<(), Box<dyn std::error::Error>> { let api = ApiBuilder::new().with_token(Some(HF_TOKEN.to_string())).build()?; let repo = api.repo(Repo::with_revision( MODEL_ID.to_string(), RepoType::Model, MODEL_REVISION.to_string().to_string(), )); let tokenizer_filename = repo.get("tokenizer.json")?; let tokenizer = Tokenizer::from_file(tokenizer_filename.clone()).unwrap(); let prompt = "Why is the sky blue?"; let tokens = tokenizer .encode(prompt, true) .unwrap() .get_ids() .to_vec(); println!("Generated {}", tokens.len()); Ok(()) } </pre> The <code>hf_hub</code> crate is smart and caches the model once downloaded. While initializing the model from still takes about 600ms, it should be done only once in the application, counting tokens is quite fast, generally under 1ms. <h2><code>aarch64-pc-windows-msvc</code> issues with <code>candle-examples</code></h2> In the <a href="https://github.com/Kamalabot/cratesploring/blob/main/candle_explorer/gemma-tokenizer/src/main.rs">original example</a>, this code is based on depends on the <a href="https://crates.io/crates/candle-examples"><code>candle-examples</code></a> crate, which fails to build on <code>aarch64</code> architectures. The issue is caused by one of its dependencies, the <a href="https://crates.io/crates/gemm-f16"><code>gemm-f16</code></a> crate. There are workarounds described in <a href="https://github.com/sarah-quinones/gemm/issues/31">this issue</a>. For <code>aarch64-pc-windows-msvc</code>, adding the configuration below to <code>.cargo/config.toml</code> file should do the trick: <pre style="background-color:#282a36;"> [build] rustflags = [ "-Ctarget-feature=+fp16,+fhm" ] </pre>

Quickly count tokens in Large Language Models (LLMs) like Gemini and Gemma using Rust. This efficient method avoids slow network calls, leveraging the `tokenizers` crate for local processing. The code example demonstrates token counting with minimal dependencies, even handling `aarch64` architecture challenges. Get started with fast, local token counting now!

Rust Markdown Syntax Highlighting: A Practical Guide 2024-09-20T21:15:00Z https://bandarra.me/posts/Rust-Markdown-Syntax-Highlighting-A-Practical-Guide You're a Rust developer, and you love Markdown's simplicity and readability. You might use it to write blog posts, documentation, or even as part of an interactive code editor. However, displaying plain code within Markdown can be tough on the eyes. Enter syntax highlighting, a feature that adds color and structure to your code, making it more visually appealing and easier to understand. This blog post will guide you on combining two powerful Rust libraries –<a href="https://crates.io/crates/pulldown-cmark"> pulldown-cmark</a> and <a href="https://crates.io/crates/syntect">syntect</a> – to seamlessly add syntax highlighting to your Markdown files and output the result as a styled HTML file. We'll cover: <ul> <li>How the pulldown-cmark library works to parse Markdown.</li> <li>How to leverage pulldown-cmark events to specifically target code blocks.</li> <li>How to integrate syntect for syntax highlighting your code.</li> <li>Practical examples and best practices to ensure efficient syntax highlighting.</li> </ul> Let's get started! <h2>Understanding Markdown Events with <code>pulldown-cmark</code></h2> You're already familiar with Markdown's simple syntax, but the key to working with it programmatically is understanding how <code>pulldown-cmark</code> represents the parsed content. This library uses events to model the structure of your Markdown document. Think of each event as a signal about what's being encountered while parsing. Let's break down the key events you'll be working with: <ul> <li><code>Event::Start(Tag)</code>: Indicates the start of a Markdown element. The <code>Tag</code> enum reveals what type of element it is: <ul> <li><code>Tag::Heading</code></li> <li><code>Tag::CodeBlock</code></li> <li><code>Tag::ListItem</code></li> <li>And more.</li> </ul> </li> <li><code>Event::End(TagEnd)</code>: Signals the end of a Markdown element.</li> <li><code>Event::Text(String)</code>: Represents the text content within a Markdown element.</li> <li><code>Event::Code(String)</code>: Indicates a code block and provides the actual code text.</li> </ul> To illustrate how these events work in identifying code blocks, here's a basic example: <pre style="background-color:#282a36;"> use pulldown_cmark::{Event, Parser, Tag, TagEnd}; fn main() { let markdown = r#" # Hello, World Here's a code block: ```rust fn main() { println!("Hello, World"); } ``` "#; let parser = Parser::new(markdown); for event in parser { match event { Event::Start(Tag::CodeBlock(_)) => { println!("Code block start"); } Event::End(TagEnd::CodeBlock) => { println!("Code block end"); } Event::Text(t) => { println!("Text: {}", t); } _ => {} } } } </pre> In this example, the loop iterates through the events emitted by <code>pulldown-cmark</code>. We are particularly interested in events representing the start and end of code blocks, and also the <code>Text</code> events that appear inside of code blocks. Now that you understand these core concepts, you're ready to move on to incorporating <code>syntect</code> for syntax highlighting! <h2>Highlighting Code with <code>syntect</code></h2> Now that you've learned how to identify code blocks using <code>pulldown-cmark</code> events, let's bring in the powerful syntax highlighting capabilities of <code>syntect</code>. This library makes applying beautiful syntax coloring to your code incredibly straightforward. <h3>What <code>syntect</code> brings to the table</h3> The <code>syntect</code> library shines by providing you with the tools to define and apply custom syntax definitions and color themes. It even leverages Sublime Text's widely popular syntax definitions, enabling you to instantly support a plethora of programming languages. Here's a breakdown of what <code>syntect</code> offers: <ul> <li>Sublime Text Compatibility: The library utilizes Sublime Text's <code>tmTheme</code> files for creating color themes. There's a wealth of existing themes you can use or customize.</li> <li>Extensive Language Support: With the default syntax sets included in <code>syntect</code>, you gain immediate support for a vast array of languages.</li> <li>Easy Integration: Integrating <code>syntect</code> is a breeze. The library provides a clean interface for applying syntax highlighting to code.</li> <li>HTML Output: <code>syntect</code> can seamlessly generate HTML output, allowing you to embed syntax-highlighted code directly within your web pages or documents.</li> </ul> <h3>Getting Started with <code>syntect</code></h3> Here's a quick demonstration on how to apply syntax highlighting using <code>syntect</code>: <pre style="background-color:#282a36;"> use syntect::{highlighting::ThemeSet, html::highlighted_html_for_string, parsing::SyntaxSet}; fn main() { let code = r#" fn main() { println!("Hello, World"); } "#; let syntax_set = SyntaxSet::load_defaults_newlines(); let syntax_reference = syntax_set.find_syntax_by_token("rust").unwrap(); let theme = ThemeSet::load_defaults().themes["base16-ocean.dark"].clone(); let html = highlighted_html_for_string(code, &syntax_set, &syntax_reference, &theme).unwrap(); println!("{}", html); } </pre> In this snippet: <ol> <li><code>SyntaxSet::load_defaults_newlines()</code> loads the default set of syntax definitions, including definitions for Rust, JavaScript, Python, and many other languages.</li> <li><code>syntax_set.find_syntax_by_token("rust")</code> retrieves the specific syntax definition for Rust, which is later used to highlight the code.</li> <li><code>ThemeSet::load_defaults().themes["base16-ocean.dark"].clone()</code> accesses the <code>base16-ocean.dark</code> theme from the default set of themes, offering a clean and modern dark theme.</li> <li><code>highlighted_html_for_string()</code> is the main function responsible for applying highlighting. It takes the code, the syntax set, the theme, and the chosen language, generating a syntax highlighted HTML snippet.</li> <li>The generated <code>html</code> string is then printed to the console.</li> </ol> Let's dive deeper into customization next! <h2>Integrating <code>pulldown-cmark</code> and <code>syntect</code> for Syntax Highlighting</h2> Now you're ready to combine the power of <code>pulldown-cmark</code> and <code>syntect</code> to bring syntax highlighting to your Markdown content. This section walks you through the process, step by step, with code examples to guide you. Let's start by outlining the key steps: <ol> <li>Parse Markdown with <code>pulldown-cmark</code>: Use <code>pulldown-cmark</code>'s event iterator to extract the relevant data from your Markdown content.</li> <li>Identify Code Blocks: Specifically look for <code>Event::Start(Tag::CodeBlock)</code> events to pinpoint code sections.</li> <li>Apply Syntax Highlighting with <code>syntect</code>: For each code block: <ul> <li>Determine the language used (e.g., "rust").</li> <li>Use <code>syntect</code> to apply the appropriate syntax highlighting.</li> <li>Replace the code block content with syntax highlighted HTML.</li> </ul> </li> <li>Render the Final HTML Output: Stitch the highlighted code blocks back into the <code>pulldown-cmark</code> events stream. Finally, use <code>pulldown-cmark::html::push_html</code> to generate the HTML representation of your Markdown.</li> </ol> Here's how you can implement these steps within a function named <code>markdown_to_html</code>: <pre style="background-color:#282a36;"> pub fn markdown_to_html(markdown: &str) -> String { static SYNTAX_SET: LazyLock<SyntaxSet> = LazyLock::new(SyntaxSet::load_defaults_newlines); static THEME: LazyLock<Theme> = LazyLock::new(|| { let theme_set = ThemeSet::load_defaults(); theme_set.themes["base16-ocean.dark"].clone() }); let mut sr = SYNTAX_SET.find_syntax_plain_text(); let mut code = String::new(); let mut code_block = false; let parser = Parser::new(markdown).filter_map(|event| match event { Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(lang))) => { let lang = lang.trim(); sr = SYNTAX_SET .find_syntax_by_token(&lang) .unwrap_or_else(|| SYNTAX_SET.find_syntax_plain_text()); code_block = true; None } Event::End(TagEnd::CodeBlock) => { let html = highlighted_html_for_string(&code, &SYNTAX_SET, &sr, &THEME) .unwrap_or(code.clone()); code.clear(); code_block = false; Some(Event::Html(html.into())) } Event::Text(t) => { if code_block { code.push_str(&t); return None; } Some(Event::Text(t)) } _ => Some(event), }); let mut html_output = String::new(); pulldown_cmark::html::push_html(&mut html_output, parser); html_output } </pre> Let's examine this code: <ul> <li>Lazy Initialization: You'll see <code>LazyLock</code> from the <code>lazy_static</code> crate used for both <code>SYNTAX_SET</code> and <code>THEME</code>. This ensures the syntax set and theme are only loaded once during the application's lifetime.</li> <li>Code Block Detection: We check if we have a code block using <code>Event::Start(Tag::CodeBlock)</code> to track the start of a code block and if a block has ended with <code>Event::End(TagEnd::CodeBlock)</code>.</li> <li>Language Determination: <code>CodeBlockKind::Fenced</code> will retrieve the fenced code's language (<code>lang</code>). It attempts to locate the matching language within the <code>SYNTAX_SET</code>, falling back to the plain text syntax if no language matches.</li> <li>Syntax Highlighting: If a code block is found, the code content (<code>code</code>) is highlighted using <code>highlighted_html_for_string</code> and a HTML representation of the code is returned in the Event stream.</li> </ul> Now, this is an essential example of how to use <code>pulldown-cmark</code> and <code>syntect</code>. The core concept is how events are filtered for certain events and replaced with new HTML. We've touched on many ways to apply these ideas. It's up to you to create different tools or applications based on your specific use cases! <h2>Optimization and Performance Best Practices</h2> You've now got a good understanding of how to use <code>pulldown-cmark</code> and <code>syntect</code> for syntax highlighting. However, for real-world use cases, you'll likely want to optimize the process for speed and efficiency, particularly when dealing with large Markdown files. Here are some essential best practices to keep in mind: <h3>Optimizing Syntax Set and Theme Loading</h3> The initial loading of syntax sets and themes is a relatively expensive operation. Since loading these resources can significantly impact performance, it's crucial to load them wisely. You can use <code>LazyLock</code> to ensure these resources are loaded only when needed, rather than upfront: <pre style="background-color:#282a36;"> static SYNTAX_SET: LazyLock<SyntaxSet> = LazyLock::new(SyntaxSet::load_defaults_newlines); static THEME: LazyLock<Theme> = LazyLock::new(|| { let theme_set = ThemeSet::load_defaults(); theme_set.themes["base16-ocean.dark"].clone() }); </pre> This way, <code>SYNTAX_SET</code> and <code>THEME</code> are loaded only once and will be available globally in your project, ensuring that resources are efficiently managed, reducing unnecessary overhead. <h3>Efficient Event Processing Techniques</h3> A naïve approach to handle the events is to use <code>collect()</code> from the <code>pulldown-cmark</code> event iterator, turning it into a <code>Vec</code> of <code>Event</code>s. However, this approach iterates over the entire vector multiple times, creating performance problems for larger Markdown files. Here's how you can rewrite the core loop of the markdown rendering function to use an iterator approach, which optimizes for performance: <pre style="background-color:#282a36;"> // ... let parser = Parser::new(markdown).filter_map(|event| { match event { Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(lang))) => { // ... Handle start of a code block. } Event::End(TagEnd::CodeBlock) => { // ... Handle the end of a code block. } Event::Text(t) => { // ... Handle Text within a code block } _ => Some(event), // Return other events to continue the processing } }); // This uses a `filter_map`, and the `match` inside creates the output based on the events. let mut html_output = String::new(); pulldown_cmark::html::push_html(&mut html_output, parser); // ... </pre> In this revised snippet, we employ a filter and mapping pattern, creating a streamlined and performant code. The idea is that the <code>pulldown-cmark::html::push_html</code> method iterates through each event on the fly, applies the logic and only modifies the needed events. <h4>Summary of Optimizations</h4> By embracing these optimizations, you can significantly improve the performance and efficiency of your syntax highlighting code while reducing the overall memory consumption: <ul> <li>Use <code>LazyLock</code> for delayed loading.</li> <li>Process events iteratively instead of creating intermediate vectors.</li> <li>Use efficient techniques to dynamically load the appropriate language definition, handling unexpected languages gracefully.</li> </ul> <h2>Conclusion: Elevating Markdown Rendering with Syntax Highlighting</h2> Combining the power of <code>pulldown-cmark</code> and <code>syntect</code> allows you to unlock a whole new level of polish and functionality when working with Markdown files in your Rust projects. This approach transforms Markdown rendering into something truly delightful, enhancing your ability to produce visually engaging and easy-to-read content for blogs, documentation, and code editors. Imagine generating your documentation with beautifully highlighted code, creating blog posts with captivating syntax highlighting, or empowering your interactive code editor with the elegance of colored code – this dynamic duo empowers you to achieve all this and more. By mastering these libraries, you not only streamline the process of creating Markdown-based content, but you also infuse it with an enhanced visual experience, ultimately enhancing communication and readability. You can focus on creating clear, structured content, knowing that your code will be presented with the style it deserves. Take the time to experiment with these powerful tools, explore different themes, languages, and use cases. As you become comfortable with the capabilities of <code>pulldown-cmark</code> and <code>syntect</code>, you'll discover new ways to create compelling and engaging content with Markdown.

Add syntax highlighting to your Markdown files using Rust's pulldown-cmark and syntect libraries. This tutorial shows you how to parse Markdown, target code blocks, integrate syntect for highlighting, and optimize for performance with practical examples and best practices, resulting in styled HTML output.

Optimizing Your Rust Workflow: Mitigating Unnecessary Dependency Recompilation 2024-09-10T08:49:00Z https://bandarra.me/posts/optimizing-your-rust-workflow-mitigating-unnecessary-dependency-recompilation <img src="/images/rust-developer-frustrated.jpg" alt="A frustrated Rust developer" /> As a Rust developer using Visual Studio Code and <code>rust-analyzer</code>, you might have encountered a common problem: unnecessary recompilation of dependencies upon saving a file. Even changes to files seemingly unrelated to dependencies can trigger a <code>cargo check</code>, causing delays in your development workflow. This article examines the reasons behind this behavior and offers practical solutions to mitigate the issue. Why Does <code>rust-analyzer</code> Trigger Dependency Recompilation? <code>rust-analyzer</code> aims to provide a comprehensive and accurate understanding of your project, constantly updating its internal model as you work. Whenever you save a file, <code>rust-analyzer</code> assumes the change might impact dependencies, prompting it to run a <code>cargo check</code> to validate the project's consistency. This validation involves recompiling dependencies. Understanding the Challenge Predicting the precise impact of a change, even on seemingly unrelated dependencies, is difficult. This difficulty arises from factors such as macros, where a change can cascade through your code. <code>rust-analyzer</code>, in its effort to provide the most reliable feedback and code completion, takes a more cautious approach, triggering recompilation for potentially impacted dependencies. Scenarios Where Recompilation is More Noticeable The recompilation issue becomes particularly noticeable in scenarios with a high number of dependencies. <ul> <li> Projects with a Large Dependency Tree: Large, interconnected dependency trees result in longer compilation times. <code>rust-analyzer</code> analyzes each dependency, adding significant overhead. </li> <li> Dependencies with Build Scripts: Dependencies involving build scripts can drastically increase compilation time. Build scripts generate code, download external resources, or configure build settings, contributing to the complexity and execution time. </li> <li> Native Dependencies: Dependencies with native code (like C or C++) add an additional layer of complexity. Native libraries must be compiled and linked, further delaying the build process. </li> </ul> Mitigating the Issue <ul> <li> <code>cargo check</code> is faster than <code>cargo test</code> or <code>cargo build</code>. So, these aren't more effective alternatives to checking your code on save. </li> <li> Disabling <code>rust-analyzer.checkOnSave</code>. While you can disable code checks on save by setting <code>rust-analyzer.checkOnSave</code> to <code>false</code> in your project workspace settings, it effectively turns the editor into a less efficient development environment by completely disabling checks. </li> </ul> <pre style="background-color:#282a36;"> { "rust-analyzer.checkOnSave": false, } </pre> <ul> <li><code>rust-analyzer.check.extraArgs</code>. The most effective solution is to use <code>rust-analyzer.check.extraArgs</code> to configure a separate target directory for <code>cargo check</code> operations:</li> </ul> <pre style="background-color:#282a36;"> { "rust-analyzer.checkOnSave": true, "rust-analyzer.check.extraArgs": [ "--target-dir", "${workspaceFolder}/target/check" ] } </pre> Conclusion While dependency recompilation is a common challenge faced by Rust developers, the right configuration can significantly improve the development cycle by reducing unnecessary delays. Understanding the reasoning behind <code>rust-analyzer</code>'s approach is essential for effectively managing these issues. You can optimize your workspace configuration to achieve a faster, more responsive coding experience, maximizing your productivity and streamlining your development workflow.

Speed up your Rust development workflow in VS Code! This guide tackles the frustrating issue of unnecessary dependency recompilation with `rust-analyzer`, explaining why it happens and offering effective solutions, including configuring a separate target directory for `cargo check` to drastically reduce build times. Learn how to optimize your settings for a smoother coding experience.

Heterogeneous collections in Rust 2023-10-03T00:00:00Z https://bandarra.me/posts/Heterogeneous-collections-in-Rust In some occasions, when programming software, developers run into the need of heterogenous collections - that is, a collection that can store objects of different types. In Rust, there are different ways a developer can achieve that, with different tradeoffs. This article will look into a few different ways to achieve this. <h2>Using Enums</h2> Rust <a href="https://doc.rust-lang.org/reference/items/enumerations.html">enums</a> are a great way to achieve this. Provided that all implementations of the objects to be store are known at development time, developers can create an enum that wraps each possible type, then create a collection for those enums. Then, to access the methods and attributes of the inner class, a <a href="https://doc.rust-lang.org/reference/expressions/match-expr.html">match expression</a> can be used to retrieve the inner object. <pre style="background-color:#282a36;"> enum ComponentType { FirstComponent(MyFirstComponent), SecondComponent(MySecondComponent), } struct MyFirstComponent { } impl MyFirstComponent { fn do_first_component_thing(&self) { println!("First Component"); } } struct MySecondComponent { } impl MySecondComponent { fn do_second_component_thing(&self) { println!("Second Component"); } } fn main() { // Create a collection of enums; let mut components: Vec<ComponentType> = Vec::new(); // Add the enums to the collection, wrapping the target type. components.push(ComponentType::FirstComponent(MyFirstComponent {})); components.push(ComponentType::SecondComponent(MySecondComponent {})); // Use match expressions to retrieve the object from the enum and access methods and attributes. if let ComponentType::FirstComponent(component) = &components[0] { component.do_first_component_thing(); } } </pre> An advantage of this method is that the implementation is quite simple and idiomatic and, when used inside an Array, for instance, it will allocate all objects on the stack (the <code>Vector</code> used in this example will allocate on the heap, though). On the other hand, a challenge with this approach is that those component types need to be known when writing the code. Think about a library that needs to store objects from different types, but those are only known by the user of that library. <h2>Using Traits</h2> An alternative is using <a href="https://doc.rust-lang.org/reference/items/traits.html">traits</a> as alternate solution, where: <ul> <li>the common methods for Components are defined in the <code>Component</code> trait.</li> <li>each relevant component struct implements the <code>Component</code> trait.</li> <li>since the size of the objects being added to the collection are not know, the objects need to be wrapped in a <a href="https://doc.rust-lang.org/std/boxed/struct.Box.html">Box</a>.</li> </ul> <pre style="background-color:#282a36;"> // Declare a trait with common behaviour. trait Component { fn do_component_thing(&self); } struct MyFirstComponent {} // Implement the trait for each type. impl Component for MyFirstComponent { fn do_component_thing(&self) { println!("First Component"); } } struct MySecondComponent {} impl Component for MySecondComponent { fn do_component_thing(&self) { println!("Second Component"); } } fn main() { let mut components: Vec<Box<dyn Component>> = Vec::new(); components.push(Box::new(MyFirstComponent { })); components.push(Box::new(MySecondComponent { })); components[0].do_component_thing(); components[1].do_component_thing(); } </pre> This approach works well when it's only necessary to access the common method in all traits. A disadvantage of this approach is that elements will always be allocated on the heap, and another disadvantage is that it's only possible to access common methods. <h2>Using Any</h2> The Rust documentation describes the <a href="https://doc.rust-lang.org/std/any/trait.Any.html">Any</a> type as A trait to emulate dynamic typing.. It provides a <code>downcast</code> method, which allows typecasting to different types. <pre style="background-color:#282a36;"> use std::any::Any; struct MyFirstComponent { } impl MyFirstComponent { fn do_first_component_thing(&self) { println!("First Component"); } } struct MySecondComponent { } impl MySecondComponent { fn do_second_component_thing(&self) { println!("Second Component"); } } fn main() { let mut components: Vec<Box<dyn Any>> = Vec::new(); components.push(Box::new(MyFirstComponent {})); components.push(Box::new(MySecondComponent {})); if let Some(component) = components[0].downcast_ref::<MyFirstComponent>() { component.do_first_component_thing(); } if let Some(component) = components[1].downcast_ref::<MySecondComponent>() { component.do_second_component_thing(); } } </pre> While this will still always allocate objects on the heap, it's now possible to have different component types inside the data structure, cast them to original types and access component specific attributes and methods. There's one small issue, though - there is no bound to which types can be added to the structure and the line below wokis just fine: <pre style="background-color:#282a36;"> components.push(Box::new("I shouldn't be here").to_string()); </pre> <h2>Mixing Any and Traits</h2> <code>Any</code> can be used along with <code>Traits</code> to create bounds for the object. The trick is to add a method to the trait that converts the object to <code>Any</code>, which will then be downcasted to other objects. Each structure will then have to implement the trait, and the conversion method: <pre style="background-color:#282a36;"> use std::any::Any; trait Component { fn as_any(&self) -> &dyn Any; } struct MyFirstComponent { } impl MyFirstComponent { fn do_first_component_thing(&self) { println!("First Component"); } } impl Component for MyFirstComponent { fn as_any(&self) -> &dyn Any { self } } struct MySecondComponent { } impl MySecondComponent { fn do_second_component_thing(&self) { println!("Second Component"); } } impl Component for MySecondComponent { fn as_any(&self) -> &dyn Any { self } } fn main() { let mut components: Vec<Box<dyn Component>> = Vec::new(); components.push(Box::new(MyFirstComponent {})); components.push(Box::new(MySecondComponent {})); if let Some(component) = components[0].as_any().downcast_ref::<MyFirstComponent>() { component.do_first_component_thing(); } if let Some(component) = components[1].as_any().downcast_ref::<MySecondComponent>() { component.do_second_component_thing(); } } </pre> While, again, this will still allocate objects on the heap, object specific methods and attributes can be used with a downcast, and the collection is bound to objects that implement that trait. One big downside is having to implement the trait for each object, which is just boilerplate. <h3>Using proc-macro-derive to avoid boilerplate</h3> A solution to the boilerplate using using a <a href="https://doc.rust-lang.org/reference/procedural-macros.html#derive-macros">procedural macro</a> to implement the boiler plate: <pre style="background-color:#282a36;"> // A derive macro needs to live in its own crate. #[proc_macro_derive(Component)] pub fn component_macro_derive(input: TokenStream) -> TokenStream { let ast: DeriveInput = syn::parse(input).unwrap(); let name = &ast.ident; let gen = quote! { impl Component for #name { fn as_any(&self) -> &dyn Any { self } } }; gen.into() } // The component still lives in the project file. #[derive(Component)] struct MyFirstComponent { } impl MyFirstComponent { fn do_first_component_thing(&self) { println!("First Component"); } } </pre> <h1>Conclusion</h1> There are different ways to implement heterogenous collectionos in Rust. While the enums approach seems to be considered the most idiomatic, it's not always possible to be used. In those cases, different approaches are available, with their own tradeoffs.

Efficiently manage heterogenous collections in Rust using enums, traits, or the `Any` type. Learn the tradeoffs of each approach, from stack vs. heap allocation to accessing specific methods, and discover how to reduce boilerplate with procedural macros for optimal code structure.

A Glimpse into how ChatGPT could be used for coding 2023-01-07T00:00:00Z https://bandarra.me/posts/A-Glimpse-into-how-ChatGPT-could-be-used-for-coding There’s a lot of discussion out there around how AIs such as <a href="https://openai.com/blog/chatgpt/">ChatGPT</a> will impact Software Engineering - from making programmers redundant, to becoming an assistive, to not changing anything. Personally, I lean more towards the assistive side and, this week, I had an experience that might relate to how it could be used. The task was unrelated to software engineering. I was helping someone to get an article translated from English to Brazilian Portuguese - my mother language. Fully automated translation frequently generates awkward results, but I wanted to use <a href="https://translate.google.com/">Google Translate</a> as a tool to help me with the translation. So, my approach was interactive - Some bits I’d translate myself and be confident about it, without bothering to ask AI for help. In other bits, I’d do the translation, then ask the AI to translate and check if what it provided was better than what I came up with - sometimes it was, sometimes it wasn’t. In other cases, when I felt I couldn’t come up with a good translation, I’d ask AI to do it first, then tweak the results. The back and forth between Google Docs and Google Translate was a bit awkward, and a specialized UI for this type of interactive workflow would go a long way to increased productivity and leverage Google Translate for this use-case. Could the interaction with AI for programming look something like that? Maybe yes… maybe no… but I’m looking forward to what will come out of it!

AI's role in software engineering is evolving from redundancy fears to assistive capabilities. This account details using AI translation interactively, improving results through a blend of human expertise and AI assistance. The experience suggests a future where programmers collaborate with AI, enhancing productivity and leveraging AI's strengths for optimal outcomes.

Play music with a raspberry Pi Pico and Rust 2022-08-02T00:00:00Z https://bandarra.me/posts/Play-Music-with-the-Raspberry-Pi-Pico-and-Rust If you are coming from Micropython or the Arduino IDE, playing a musical note is as straightforward as calling <code>freq()</code> passing the desired note frequency as a parameter, as in the MicroPython example below: <pre style="background-color:#282a36;"> import machine p12 = machine.Pin(12) pwm12 = machine.PWM(p12) pwm12.freq(440) # 440Hz is an A4 note. pwm12.duty(512) </pre> However, when using the <a href="https://www.raspberrypi.com/documentation/microcontrollers/c_sdk.html#raspberry-pi-pico-cc-sdk">Raspberry Pi Pico C/C++ SDK</a> or the <a href="https://github.com/rp-rs/rp-hal/">Rust's rp-hal</a>, there isn't a method to set the frequency directy and it is necessary to use a lower-level API, the <a href="https://raspberrypi.github.io/pico-sdk-doxygen/group__hardware__pwm.html#gad6cf6d9237144234732a50eb6d5e4fe9"><code>pwm_config_set_wrap()</code></a> on C/C++ or <a href="https://docs.rs/rp2040-hal/0.5.0/rp2040_hal/pwm/struct.Slice.html#method.set_top"><code>set_top</code></a> in Rust. In order to use this, understanding how PWM is implemented on the Pico is helpful. Here's what the <a href="https://raspberrypi.github.io/pico-sdk-doxygen/group__hardware__pwm.html#gad6cf6d9237144234732a50eb6d5e4fe9:~:text=Detailed%20Description">C/C++ documentation</a> says: <blockquote> The PWM hardware functions by continuously comparing the input value to a free-running counter. This produces a toggling output where the amount of time spent at the high output level is proportional to the input value. The fraction of time spent at the high signal level is known as the duty cycle of the signal. The default behaviour of a PWM slice is to count upward until the wrap value (<code>pwm_config_set_wrap</code>) is reached, and then immediately wrap to 0. PWM slices also offer a phase-correct mode, where the counter starts to count downward after reaching TOP, until it reaches 0 again. </blockquote> The hardware PWM is implemented with a counter that, by default, is incremented at the same rate as the Pico crystal frequency, or 12Mhz, and input that is compared to that counter value, in order to set the voltage to high or low. The input used for comparison is passed to the system using <code>set_duty()</code> and the maximum value for the counter is set via <code>set_top()</code>. In the example below, the counter is set to 1000, creating a frequency of 12Khz (12Mhz divided by 1000). In this case, we're setting the duty cycle to half the value of the counter - or a 50% duty cycle. <pre style="background-color:#282a36;"> pwm.set_top(1000); pwm.channel_b.set_duty(500); </pre> By setting top to 1, a maximum frequency of 12Mhz can be created. The maximum value that can be passed to <code>set_top()</code> is <code>65535</code> (an <code>u16</code>), creating a frequency of 183Hz. Since we are concerned about musical notes and the humans can detect sounds between 20Hz and 20Khz, which doesn't fully overlap with the available frequencies between 183Hz to 12Mhz. This problem can be solved by setting a clock divider, via <code>set_div_int()</code> and <code>set_div_frac()</code>. This causes the counter to be updated at a lower frequency. When setting the divider to 2, for instance, the counter is only incremented every other cycle, decreasing the frequency of updates to 6Mhz (12Mhz / 2). Setting the divider to 40, for instance, would set the maximum update frequency to 300Kh. Combined with <code>set_top()</code>, this allows a minimum frequency of 4.5Hz, which is below the minimum for the human hearing. It is possible to calculate the <code>top</code> value for a particlular note by dividing the 12Mhz by the divider, then by the note frequency. Using a 40 as a divider and the an A4 note as an example, divide 12Mhz by 40, and the result by 440.0. The resulting top value is 681. Below is an example of playing musical notes with rp-hal on the Rasperry Pi Pico: <pre style="background-color:#282a36;"> fn calc_note(freq: f32) -> u16 { (12_000_000 as f32 / 40 as f32 / freq) as u16 } let pwm_slices = hal::pwm::Slices::new(pac.PWM, &mut pac.RESETS); let mut buzzer = pwm_slices.pwm5; // Notes let c4 = calc_note(261.63); let d4 = calc_note(293.66); let e4 = calc_note(329.63); let f4 = calc_note(349.23); let g4 = calc_note(392.00); let a4 = calc_note(440.00); let b4 = calc_note(493.88); let space = calc_note(0.0); let doremi = [c4, d4, e4, f4, g4, a4, b4]; let twinkle_twinkle = [ c4, c4, g4, g4, a4, a4, g4, space, f4, f4, e4, e4, d4, d4, c4, space, g4, g4, f4, f4, e4, e4, d4, space, g4, g4, f4, f4, e4, e4, d4, space, c4, c4, g4, g4, a4, a4, g4, space, f4, f4, e4, e4, d4, d4, c4, space, ]; for top in twinkle_twinkle { buzzer.channel_b.set_duty(top / 2); // 50% Duty Cycle buzzer.set_top(top); delay.start(500.milliseconds()); let _ = nb::block!(delay.wait()); buzzer.channel_b.set_duty(0); delay.start(100.milliseconds()); let _ = nb::block!(delay.wait()); } </pre>

Generate musical notes on a Raspberry Pi Pico using C/C++ or Rust. Learn to control PWM frequency and duty cycle for precise sound output. Example code demonstrates playing "Twinkle Twinkle Little Star," covering frequency calculation and clock divider techniques for optimal sound reproduction within the human hearing range.

Drive a LED grid with a Raspberry Pi Pico and Web Serial - Part 1 2022-02-23T00:00:00Z https://bandarra.me/posts/Driving-a-ledgrid-with-a-Raspberry-Pi-Pico-and-WebSerial-Part-1 Last year, I got one of those <a href="https://www.aliexpress.com/item/1005001659493361.html">LED grids from AliExpress</a> and I wanted to connect it to my computer, while allowing others to control what is displayed from a web page. To achieve that, I used a <a href="https://www.raspberrypi.com/products/raspberry-pi-pico/">Raspberry Pi Pico</a> connected to my computer to control the LED grid, while controlling the Pico itself via <a href="https://web.dev/serial/">Web Serial</a> and using <a href="https://firebase.google.com/products/realtime-database">Firebase Realtime Database</a> to allow others to remotely change what is rendered. This is a 3-part blog post covering: <ul> <li><a href="/2022/02/21/Driving-a-ledgrid-with-a-Raspberry-Pi-Pico-and-WebSerial-Part-1/">Part 1</a>: Control an LED Grid from Pico using the serial port.</li> <li>Part 2: Control the Pico using Web Serial from the computer.</li> <li>Part 3: Remotely control the LED Grid from a web page.</li> </ul> <h1>Part 1 - Control a LED Grid with the Raspberry Pi Pico</h1> <h2>What you will need</h2> <ul> <li><a href="https://www.raspberrypi.com/products/raspberry-pi-pico/">Raspberry Pi Pico</a>.</li> <li>LED Grid like <a href="https://www.aliexpress.com/item/1005001659493361.html">this one</a>.</li> <li>External 5V / 5A power source.</li> <li>Breadboard.</li> <li>Breadboard jumper wires.</li> </ul> The LED Grid has 256 LEDs, distributed in 16 columns and 16 rows. Each LED can consume up to <code>20mA</code> (milliamps) of power, when set to white. On total, the entire LED Grid can consume up to around <code>5A</code>, which is way higher than what can be powered via the USB port powering the Pico, so an external power supply capable of handling that is needed. To power the LED grid, the external power supply is connected to the power rails on the breadboard, and the rails are connected <code>VCC</code> and <code>GND</code> on the LED grid. <code>GPIO7</code> (Pin 10) on the Pico is used to to control the LED grid, so Pin 10 on the Pico is connected to <code>DIN</code> on the LED grid, and the circuit is closed by connecting one of the <code>GND</code> pins on the Pico to the ground power rail. This diagram shows how things should look like with everything connected: <img src="/img/2022/02/LedGrid_bb.jpg" alt="LED Grid" title="LED grid" /> Note: It is possible to power the Pico with the external power source by connecting the positive power rail to <code>VSYS</code> (Pin 39). In this project, since the Pico will be connected to the USB for the serial communication, it can draw power from the USB port and doesn't need to be connected to <code>VSYS</code>. <h2>Getting started with LED strips</h2> There are different models of LED strips out there. Some, like the ones based on <a href="https://cdn-shop.adafruit.com/datasheets/WS2801.pdf">WS2801</a> can be controlled via the SPI bus - this make them ideal to be controlled from computers like the Raspberry Pis. However, the LED controllers used on this LED grid is the <a href="https://cdn-shop.adafruit.com/datasheets/WS2812B.pdf">WS2812B</a>. Instead of using a higher level protocol, like SPI or I2C, sending data to those controllers is achieved by setting pins to <code>HIGH</code> and <code>LOW</code> with specific timings, with a technique called <a href="https://en.wikipedia.org/wiki/Bit_banging">bit banging</a>. <h3>Bit banging and the Pico PIO</h3> Implementing bit banging usually requires very careful programming, due to the interaction of the specific timings required by the protocol, the CPU clock cycle and other parts of the code that also use the CPU clock cycle. The Pico adds a feature called Programmable Input/Output (PIO). It implements a state machine connected to a FIFO queue that exchange data with the main program have access to the board's GPIO, making the code for the protocol and other parts of the code independent, in terms of clock cycles. An explanation of the Raspberry Pi Pico PIO is outside the scope of this article, and has already been covered by a number of online resources, like <a href="https://medium.com/geekculture/raspberry-pico-programming-with-pio-state-machines-e4610e6b0f29">this blog post</a>. The <a href="https://github.com/raspberrypi/pico-examples/">Raspberry Pi Pico Examples</a> repository implements the protocol needed on the <a href="https://github.com/raspberrypi/pico-examples/blob/master/pio/ws2812/ws2812.pio">ws2812 example</a>, with timings adjusted to work with the ws2812b: <pre style="background-color:#282a36;"> .program ws2812b .side_set 1 .define public T1 3 .define public T2 4 .define public T3 3 ... </pre> The Raspberry Pi Pico SDK CMake file will take the <code>.pio</code> program and generate the related code, introducing the <code>ws2812b_program_init()</code> method to the application, which allows initialising the state machine with the pio port, the pin it controls and the clock. Data is sent to the LED strip by calling <a href="https://raspberrypi.github.io/pico-sdk-doxygen/group__hardware__pio.html#gaee8bfc3409cb8d93cccdeda3961bc377"><code>pio_sm_put_blocking()</code></a>. <pre style="background-color:#282a36;"> class LedStrip { public: PIO pio; uint32_t *buffer; int pin_tx; int length; int sm = 0; LedStrip(PIO pio, int pin_tx, uint32_t buffer[], int length): pio(pio), pin_tx(pin_tx), buffer(buffer), length(length) { uint offset = pio_add_program(pio, &ws2812_program); ws2812_program_init(pio, sm, offset, pin_tx, 800000, false); } void clear() { for (int i = 0; i < length; i++) { buffer[i] = 0; } } void update() { for (int i = 0; i < length; i++) { pio_sm_put_blocking(pio, 0, buffer[i] << 8u); } } }; </pre> Those details are encapsulated in the <code>LedStrip</code> class. Besides storing information needed to control the PIO the class also stores an array buffer where each index represents the pixels on the LED strip. <h2>From LED strip to LED grid</h2> You may noticed the reference an LED strip instead of a grid a few times in this arcticle so far. This is due to, in reality, the LED grid being a LED strip where the way rows and columns are mapped can be unintuitive. <img src="/img/2022/02/ledgrid.svg" alt="LED grid diagram" title="LED grid diagram" /> Instead of following a left-right pattern across the whole grid, the LED strip snakes around the board, so that columns on even rows are refenced from left to right and columns on odd rows are referenced from right to left - while it would be expected for <code>(1, 0)</code> to reference the first LED of the second line, it actually points to the last one. <pre style="background-color:#282a36;"> class LedGrid: public LedStrip { public: int width; int height; LedGrid(PIO pio, int pin_tx, uint32_t buffer[], int width, int height): LedStrip(pio, pin_tx, buffer, width * height), width(width), height(height) { } void set_pixel(int x, int y, uint32_t color) { if (x % 2 == 0) { buffer[x * height + y] = color; } else { buffer[(x + 1) * height - y - 1] = color; } } }; </pre> The <code>LedStrip</code> implementation can be extended into a <code>LedGrid</code> and implement a <code>set_pixel()</code> method that caters for that difference. <h3>Testing the LED Grid</h3> Before moving forward with enabling the serial port, it is possible to thes the LED grid by hard coding an image into the code: <pre style="background-color:#282a36;"> const int PANEL_WIDTH = 16; const int PANEL_HEIGHT = 16; const int PIN_TX = 7; // The GPIO port controlling the LED grid. const int NUM_LEDS = PANEL_WIDTH * PANEL_HEIGHT; // The pixels for the Chrome logo. const uint32_t CHROME_LOGO[256] { 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x011500, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x011500, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x001100, 0x001300, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x000000, 0x000000, 0x000000, 0x000F00, 0x001000, 0x001200, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x001400, 0x011500, 0x000000, 0x000000, 0x080001, 0x000F00, 0x001100, 0x001300, 0x001400, 0x001400, 0x1B1B1B, 0x1B1B1B, 0x081500, 0x081700, 0x091800, 0x0A1900, 0x0B1B00, 0x0C1D00, 0x000000, 0x080001, 0x080001, 0x000B00, 0x001000, 0x001200, 0x1B1B1B, 0x0F091B, 0x05001C, 0x05001C, 0x0F091B, 0x1B1B1B, 0x0D1E00, 0x0E1E00, 0x0E1E00, 0x0E1F00, 0x0F1F00, 0x080001, 0x080001, 0x080001, 0x000C00, 0x001100, 0x0F091B, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x0F091B, 0x0F1F00, 0x0F1F00, 0x101F00, 0x101F00, 0x111F00, 0x080001, 0x080001, 0x080001, 0x030200, 0x1B1B1B, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x1B1B1B, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x080001, 0x080001, 0x080001, 0x080001, 0x1B1B1B, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x1B1B1B, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x0F091B, 0x05001C, 0x05001C, 0x05001C, 0x05001C, 0x0F091B, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x1B1B1B, 0x0F091B, 0x05001C, 0x05001C, 0x0F091B, 0x1B1B1B, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x000000, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x1B1B1B, 0x1B1B1B, 0x050001, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x000000, 0x000000, 0x070001, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x070001, 0x040001, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x101E00, 0x000000, 0x000000, 0x000000, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x080001, 0x050001, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x070001, 0x080001, 0x080001, 0x080001, 0x070001, 0x111F00, 0x111F00, 0x111F00, 0x111F00, 0x0F1B00, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, 0x070001, 0x070001, 0x060001, 0x111F00, 0x111F00, 0x0E1B00, 0x000000, 0x000000, 0x000000, 0x000000, 0x000000, }; int main() { stdio_init_all(); // Buffer for holding the values for the LED strip. uint32_t buffer[NUM_LEDS]; auto ledGrid = LedGrid(pio0, PIN_TX, buffer, PANEL_WIDTH, PANEL_HEIGHT); ledGrid.clear(); for (int x = 0; x < 16; x++) { for (int y = 0; y < 16; y++) { uint32_t color = CHROME_LOGO[y * PANEL_WIDTH + x]; ledGrid.set_pixel(x, y, color); } } ledGrid.update(); } </pre> Now, with everything connected, build the project and copy the <code>uf2</code> file to the Pico. Once it reboots, you shoudl see the Chrome logo rendered. <img src="/img/2022/02/chrome-logo.jpg" alt="Chrome logo rendered on the LED grid" title="Chrome logo rendered on the LED grid" /> <h2>Enable UART and read data from the USB</h2> The Pi Pico has 2 UART ports and, by default, those are enabled on the GPI pins. To enable UART over the USB port, the following lines need to be added to<code>CMakeLists.txt</code>: <pre style="background-color:#282a36;"> # enable usb output, disable uart output pico_enable_stdio_usb(pico_ledstrip 1) pico_enable_stdio_uart(pico_ledstrip 0) </pre> The final step to setup the UART is to ensure the program is calling <code>stdio_init_all()</code> - this makes the standard I/O functions, like <code>printf()</code>, send data to the serial port instead. <pre style="background-color:#282a36;"> stdio_init_all(); // Buffer for reading values from stdinput. uint8_t read_buffer[NUM_LEDS * 3]; // Buffer for holding the values for the LED strip. uint32_t buffer[NUM_LEDS]; auto ledGrid = LedGrid(pio0, PIN_TX, buffer, PANEL_WIDTH, PANEL_HEIGHT); while (true) { printf("Waiting for data\n"); fread(read_buffer, 1, NUM_LEDS * 3, stdin); for (int x = 0; x < 16; x++) { for (int y = 0; y < 16; y++) { int start_index = (y * PANEL_WIDTH + x) * 3; uint8_t red = read_buffer[start_index]; uint8_t green = read_buffer[start_index + 1]; uint8_t blue = read_buffer[start_index + 2]; ledGrid.set_pixel(x, y, urgb_u32(red, green, blue)); } } ledGrid.update(); printf("Received data\n"); } </pre> A new buffer, <code>read_buffer</code>, is introduced to read data from the serial port. Then, <code>LedGrid</code> is initialized with the PIO port used, the pin connected to the LED grid data line, the buffer, and the width and height for the LED strip. It then enters an infinite loop where the program blocks on the <code>fread()</code> call until <code>read_buffer</code> is full, then sets the colours to the LED strip via <code>set_pixel()</code>. Finally, it updates the display by calling <code>update()</code>. <h2>Next Step</h2> You now have an application that will run on the Raspberry Pi Pico and control the LED Grid. You can <a href="https://github.com/andreban/pico-ledgrid/releases/download/0.1.0/ledtrip_controller.uf2">download a pre-build <code>uf2</code></a> and build your own, then copy the <code>uf2</code> file to the Pico. Then, with the pico connected to your computer, navigate to <a href="https://ledmoji.bandarra.me/">https://ledmoji.bandarra.me/</a> and click <code>Connect</code>. You should be able to select your Pico and control the connected LED grid. On the next part, you will learn how to connect to the Pico using WebSerial! Stay tuned! And, in the meantime, check out the full code for the <a href="https://github.com/andreban/pico-ledgrid">project on Github</a>.

Control an LED grid with a Raspberry Pi Pico, Web Serial, and Firebase. This 3-part tutorial shows how to control a 256 LED grid using the Pico's PIO, send data via Web Serial, and enable remote control with Firebase. Learn bit-banging techniques and build a web interface for your LED project. Get started now!

Writing Doom Fire for the Raspberry Pi Pico and the Pimoroni Pico Display 2021-02-23T00:00:00Z https://bandarra.me/posts/Doom-Fire-on-the-Raspberry-Pi-Pico <img src="/img/2021/02/pico-fire.jpg" alt="Doom Fire running on a Pi Pico" title="Doom Fire running on a Pi Pico" /> <h2>Introduction</h2> The <a href="https://fabiensanglard.net/doom_fire_psx/">Doom Fire animation</a> is fire animation used for the PSX port of the original Doom game. This animation is a nice Hello World to implement when learning new graphics APIs, and I recently wrote about a <a href="/2021/01/13/Building-Doom-Fire-using-modern-JavaScript/">modern JavaScript implementation</a>. The <a href="https://www.raspberrypi.org/products/raspberry-pi-pico/">Raspberry Pi Pico</a> is a new board, based on the new RP2040 microcontroller and, along with the <a href="https://shop.pimoroni.com/products/pico-display-pack">Pimoroni Pico Display</a> makes an interesting platform to port the Doom Fire animation to. <h2>Using MicroPython</h2> <a href="https://micropython.org/">MicroPython</a> is an implementation of <a href="https://www.python.org/">Python 3</a> that is optimised to run on microcontrollers. The nice thing about MicroPython is how beginner friendly it is, as it only requires flashing a custom image and installing the <a href="https://thonny.org/">Thonny IDE</a>. The details on how to get started have been extensively covered by the <a href="https://datasheets.raspberrypi.org/pico/raspberry-pi-pico-python-sdk.pdf">official documentation</a>, blogposts, and YouTube videos, so I won't repeat those here. I do, however, wonder why the official documentation is only available as PDF file, and not as an HTML page though. Pimoroni has also done a great job and provides a custom firmware that makes it <a href="https://learn.pimoroni.com/tutorial/hel/getting-started-with-pico">a breeze to use the Pico Display from MicroPython</a> and a <a href="https://github.com/pimoroni/pimoroni-pico/tree/main/micropython/examples/pico_display">set of examples for the display</a>. If you interested on the final MicroPython implementation, <a href="https://github.com/andreban/pico-fire/blob/main/pico_fire.py">check out the source code on GitHub</a>. In my first attempt of the implementation, I had created separate methods for updating the fire, with the <code>update()</code> method and rendering the outcome, with the <code>render()</code> method: <pre style="background-color:#282a36;"> def update(self): for y in range(1, height): row = y * width next_row = (y - 1) * width for x in range(0, width): color = self.fire[row + x] pen = colorScale[color] new_x = x if color > 0: rand = random.randint(0, 3) color = color - (rand & 1) new_x = new_x + rand - 1 self.fire[next_row + new_x] = color def render(self): for y in range(0, height): row = y * width next_row = (y - 1) * width for x in range(0, width): color = self.fire[row + x] pen = colorScale[color] display.set_pen(pen) display.pixel(x, y) display.update() </pre> This implementation works and was quick to implement, even with almost no experience with Python programming. The problem is that this implementation takes almost 4 seconds to render each frame. Yes, that's 0.25 frames per second (FPS). The most obvious place to optimise is avoid looping over the pixels for the fire twice and implement updating and rendering at the same time, and merge the <code>render()</code> into <code>update()</code>: <pre style="background-color:#282a36;"> def update(self): for y in range(0, height): row = y * width next_row = (y - 1) * width for x in range(0, width): color = self.fire[row + x] pen = colorScale[color] if y > 0: new_x = x if color > 0: rand = random.randint(0, 3) color = color - (rand & 1) new_x = new_x + rand - 1 self.fire[next_row + new_x] = color display.set_pen(pen) display.pixel(x, y) display.update() </pre> This cut the time to render to 2 seconds. That's a great improvement, but not nearly enough to run at the 27 FPS required by the Doom Fire animation. At this point, I found unlikely that it would be worth working on improving the Python animation, but I also found unlikely that the Pico couldn't run fast enough to implement it. My guess was that MicroPython had a larger overhead than I expected. <h2>Using C++</h2> While the C++ process is also <a href="https://datasheets.raspberrypi.org/pico/getting-started-with-pico.pdf">well documented</a> (also as a PDF), I can't say it's as easy as getting started with MicroPython and does require installing a toolchain with a small set of tools. The documentation also covers setting up using difference IDEs. In my case, I have used CLion. Rewriting the latest Python code in C++ looks like the following: <pre style="background-color:#282a36;"> void update(uint32_t time) { for (int y = 0; y < pimoroni::PicoDisplay::HEIGHT; y++) { int row = y * pimoroni::PicoDisplay::WIDTH; int next_row = y == 0 ? 0 : (y - 1) * pimoroni::PicoDisplay::WIDTH; for (int x = 0; x < pimoroni::PicoDisplay::WIDTH; x++) { uint8_t color = fire[row + x]; uint16_t pen = pallete[color]; pico_display.setPen(pen); pico_display.setPixel(x, y); if (y > 0) { int new_x = x; int rand = std::rand() % 3; new_x = (new_x + rand - 1); color = color > 0 ? color - (rand & 1) : 0; fire[next_row + new_x] = color; } } } pico_display.update(); } </pre> From the start this code at ~20 FPS, or around 50 ms per frame, which is a huge improvement over MicroPython but still not our 27 FPS target. Since we're not worried with a high quality random number generator, it felt that a faster generator could help. A quick Google search took me to <a href="https://stackoverflow.com/a/26237777/1249994">this StackOverflow answer</a>, which promises being 2x the speed of <code>std:random()</code>: <pre style="background-color:#282a36;"> void update(uint32_t time) { for (int y = 0; y < pimoroni::PicoDisplay::HEIGHT; y++) { int row = y * pimoroni::PicoDisplay::WIDTH; int next_row = y == 0 ? 0 : (y - 1) * pimoroni::PicoDisplay::WIDTH; for (int x = 0; x < pimoroni::PicoDisplay::WIDTH; x++) { uint8_t color = fire[row + x]; uint16_t pen = pallete[color]; pico_display.setPen(pen); pico_display.setPixel(x, y); if (y > 0) { int new_x = x; int rand = fast_rand() % 3; new_x = (new_x + rand - 1); color = color > 0 ? color - (rand & 1) : 0; fire[next_row + new_x] = color; } } } pico_display.update(); } </pre> And, indeed, it it improved rendering to about 37ms per frame, exacly the 27 FPS we needed. <h3>Adding Wind</h3> The random number generated is an integer number between <code>0</code> and <code>2</code> (inclusive) that controls how the fire in a given cell is spread: <ul> <li><code>0</code> - fire spreads to the cell above and to left of the current cell.</li> <li><code>1</code> - fire spreads to the cell directly above the current cell.</li> <li><code>2</code> - fire spreads to the cell above and to the right of the current cell.</li> </ul> Adding wind means that we want to add a bias to this number. If a negative bias is added, the fire will spread more to the left and if a positive bias is added, the fire will spread more to the right. To control the wind, we are going to use the <code>B</code> button to add wind to the left and the <code>Y</code> button to add wind to the right. Checking if a button is pressed on the Pico Display can be done with a call to <code>pico_display::is_pressed()</code>: <pre style="background-color:#282a36;"> if (pico_display.is_pressed(pimoroni::PicoDisplay::X)) { // Add button handler code here. } </pre> The problem with this method is that, since we run this every frame, the wind will increase very quickly, even when pressing the button for a short period of time. Instead, what we want, is to increase/decrease the wind when it button gets pressed - more cleary, when it changes state from "not pressed" to "pressed": <pre style="background-color:#282a36;"> bool y_pressed = false; bool b_pressed = false; while (true) { if (!y_pressed && pico_display.is_pressed(pimoroni::PicoDisplay::Y)) { // Button Y changed state from "not pressed" to "pressed". wind++; } y_pressed = pico_display.is_pressed(pimoroni::PicoDisplay::Y); if (!b_pressed && pico_display.is_pressed(pimoroni::PicoDisplay::B)) { // Button B changed state from "not pressed" to "pressed". wind--; } b_pressed = pico_display.is_pressed(pimoroni::PicoDisplay::B); } </pre> We can then apply wind to our logic: <pre style="background-color:#282a36;"> void update(uint32_t time) { for (int y = 0; y < pimoroni::PicoDisplay::HEIGHT; y++) { int row = y * pimoroni::PicoDisplay::WIDTH; int next_row = y == 0 ? 0 : (y - 1) * pimoroni::PicoDisplay::WIDTH; for (int x = 0; x < pimoroni::PicoDisplay::WIDTH; x++) { uint8_t color = fire[row + x]; uint16_t pen = pallete[color]; pico_display.setPen(pen); pico_display.setPixel(x, y); if (y > 0) { int new_x = x; int rand = fast_rand() % 3; new_x = (new_x + rand - 1 + wind); if (new_x >= pimoroni::PicoDisplay::WIDTH) { new_x = new_x - pimoroni::PicoDisplay::WIDTH; } else if (new_x < 0) { new_x = new_x + pimoroni::PicoDisplay::WIDTH; } color = color > 0 ? color - (rand & 1) : 0; fire[next_row + new_x] = color; } } } pico_display.update(); } </pre> Another modification is that we now "wrap around" the fire spread: If a pixel at the first column spreads to the left, we teleport that pixel to the last column and if a pixel at the last column spreads to the right, we teleport that to the first column. <h3>More perf improvements</h3> These extra checks mean that our FPS to a hit again, and we're now back to 21 FPS. The next improvement is a trick around the <code>pico_graphics</code> API. When <code>setPixel(x, y)</code> is called, the API will check boundaries to ensure that the values are not written outside the <code>frame_buffer</code> boundaries. In our case, and after implementing the "wrap around" for the wind, we know we will never write outside the boundaries. So, instead of calling <code>setPixel(x, y</code>), we invoke the <code>ptr(x, y)</code> function, which allows manipulating the framebuffer directly, skipping the boundary validation: <pre style="background-color:#282a36;"> void update(uint32_t time) { for (int y = 0; y < pimoroni::PicoDisplay::HEIGHT; y++) { int row = y * pimoroni::PicoDisplay::WIDTH; int next_row = y == 0 ? 0 : (y - 1) * pimoroni::PicoDisplay::WIDTH; for (int x = 0; x < pimoroni::PicoDisplay::WIDTH; x++) { uint8_t color = fire[row + x]; uint16_t pen = pallete[color]; *pico_display.ptr(x, y) = pen; if (y > 0) { int new_x = x; int rand = fast_rand() % 3; new_x = (new_x + rand - 1 + wind); if (new_x >= pimoroni::PicoDisplay::WIDTH) { new_x = new_x - pimoroni::PicoDisplay::WIDTH; } else if (new_x < 0) { new_x = new_x + pimoroni::PicoDisplay::WIDTH; } color = color > 0 ? color - (rand & 1) : 0; fire[next_row + new_x] = color; } } } pico_display.update(); } </pre> This got us over 40 FPS, which is more than 27 FPS required by doom-fire. Yay! <h2>Conclusion</h2> The Raspberry Pi Pico and the Pico Display are incredibly fun to play with. While MicroPython is easy to get started and prototype, it has a significant performance cost. C/C++ is more complex to setup and probably has a steeper learning curve, but it can payoff if the extra performance is needed. I'm not an expert in Python or C++ but, if you want to check out the code, head over to the <a href="https://github.com/andreban/pico-fire/">Github repo</a> and drop issues or even pull-requests.

Run Doom Fire animation on a Raspberry Pi Pico using MicroPython or C++. MicroPython is beginner-friendly but slower; C++ offers significant performance improvements, achieving over 40 FPS. This tutorial details both implementations, optimization techniques, and adding wind effects, providing code examples and addressing performance bottlenecks.

webadb - ADB over WebUSB resources. 2021-02-04T00:00:00Z https://bandarra.me/posts/webadb-Adb-Over-Usb-Resources Most Android Developers will be familiar with the <a href="https://developer.android.com/studio/command-line/adb">adb command-line tool</a>. It allows developers to connect their development computer to an Android device and run a variety of actions, like installing, starting or stopping an application, pushing and pulling files, taking screenshots, or recording the screen. The introduction of <a href="https://web.dev/usb/">WebUSB</a> to the Web Platform opens the possibility of using ADB via a web page. This post contains some resources for developers looking for more information on using ADB over USB (or webadb). <h2>Implementations</h2> <ul> <li><a href="https://github.com/GoogleChromeLabs/wadb/">wadb</a>: my own implementation of webadb, in TypeScript. Created as a demo / exploration of implementing ADB over WebUSB. Currently used by <a href="http://screenrecord.bandarra.me/">screenrecord.bandarra.me</a></li> <li><a href="https://github.com/webadb/webadb.js">webadb.js</a>: the oldest implementation of webadb that I'm aware of. The wadb implementation is largely based on this implementation.</li> <li><a href="https://github.com/yume-chan/ya-webadb/">ya-webadb</a>: a newer implementation, also in TypeScript. Powering <a href="https://app.webadb.com/">app.webadb.com</a>. Seems to also have a <a href="https://github.com/yume-chan/ya-webadb/tree/master/packages/adb-backend-ws">WebSocket implementation of the transport layer</a>, but is not enabled on the demo site.</li> </ul> <h2>ADB Protocol Documentation</h2> Besides reading the code and contributing to existing implementations, you may want to check other resources: <ul> <li><a href="https://cs.android.com/android/platform/superproject/+/master:packages/modules/adb/">ADB Internals</a>: the official Android source repository. Contains both the C++ implementation of ADB as well as a set of text files describing various sections of the protocol: <ul> <li><a href="https://cs.android.com/android/platform/superproject/+/master:packages/modules/adb/README.md">README.md</a></li> <li><a href="https://cs.android.com/android/platform/superproject/+/master:packages/modules/adb/OVERVIEW.TXT">OVERVIEW.TXT</a></li> <li><a href="https://cs.android.com/android/platform/superproject/+/master:packages/modules/adb/protocol.txt">protocol.txt</a></li> </ul> </li> <li><a href="https://github.com/cstyan/adbDocumentation">github.com/cstyan/adbDocumentation</a>: an unnoficial documentation of the protocol. May be easier to read than the official documents.</li> </ul>

Use ADB over WebUSB! This guide explores using the Android Debug Bridge (ADB) via WebUSB, detailing implementations like wadb and ya-webadb, and providing links to crucial ADB protocol documentation for seamless web-based Android device interaction. Learn how to control your Android device from your browser.

Best practices for using the Wake Lock API. 2021-01-25T00:00:00Z https://bandarra.me/posts/Best-practices-for-using-the-Wake-Lock-API The <a href="https://web.dev/wake-lock/">Wake Lock API</a> provides a way to prevent devices from dimming or locking the screen when an application needs to keep running. I used an earlier version of the API on the <a href="https://github.com/GoogleChromeLabs/rowing-monitor">Rowing Monitor</a> project and the final version more recently for <a href="https://doom-fire.com">doom-fire</a>. Those two applications have slightly different implementations from a user perspective, which made me think of a couple of best practices: <h2>Be mindful of the user's battery life</h2> This is <a href="https://web.dev/wake-lock/#best-practices">part of the Wake Lock documentation</a>, but it's never much to repeat: There's a good reason on why devices will dimm or turn off the screen after a few seconds - a device's display is one of the components that draws the largest amount of power from the battery. This means that, to avoid wasting battery, applications should only request a wake lock when there's a clear benefit to the user. Before implementing the API, a good question to ask is: Does the user really need the device to stay awake? Here are some things consider that may help answering this question: <ul> <li>Will the user consume the on-screen content for long periods?</li> <li>Will tapping the screen multiple times just to keep it awake severily degrade the user experience?</li> <li>Is the user unable to tap the screen while using the application?</li> <li>What are the parts of the application where the Wake Lock is required? Restrict the implementation to only those parts.</li> </ul> <h2>Avoid making users think about Wake Lock (when possible)</h2> On the <a href="https://pm5-monitor-c63a2.firebaseapp.com/index.html">rowing monitor</a> application, it's clear that the user will want to keep track of their exercise on the screen once they start and while they are busy doing the exercise. This gives us a hint that we can request the Wake Lock when the user starts the exercise and release it when they stop. The device will stay awake while we know the user needs it, and only while they need it. But the user never has to think about keeping the device awake! A good question to ask is: <ul> <li>Is there a natural moment where users will need to keep the screen awake and the Wake Lock API can be seamlessy integrated into the user experience?</li> </ul> However, this is not always possible. In <a href="https://doom-fire.com">doom-fire</a>, for example, there's no natural way to tell when the user wants to keep the screen awake, so a padlock is used as a control for the user to to request/release the Wake Lock.

Optimize Wake Lock API usage for improved user experience and battery life. Learn best practices for implementing the Wake Lock API, including minimizing battery drain and seamlessly integrating wake lock functionality into your app's user experience. Discover when to automatically manage wake lock and when to provide user controls.

Integrating in-app-reviews with Trusted Web Activity 2021-01-18T00:00:00Z https://bandarra.me/posts/Integrating-in-app-reviews-with-Trusted-Web-Activity I was reading the questions on the <a href="https://stackoverflow.com/questions/tagged/trusted-web-activity/"><code>trusted-web-activity</code></a> tag on StackOverflow, as I often do, when <a href="https://stackoverflow.com/questions/65752429/how-can-i-extend-twa-application-with-in-app-review">a question</a> asking if it is possible to integrate a <a href="https://developers.google.com/web/android/trusted-web-activity/">Trusted Web Activity</a> with <a href="https://developer.android.com/guide/playcore/in-app-review"><code>in-app-reviews</code></a> caught my attention. In short, the answer is yes, it’s possible to do it, but there are caveats. <h1>Ok, tell me how to do it!</h1> Let’s assume an application bootstrapped with <a href="https://www.npmjs.com/package/@bubblewrap/cli">Bubblewrap</a> and go over the changes needed to that application to implement in-app-reviews. I you are new to Trusted Web Activity, I do recommend reading the <a href="https://developers.google.com/web/android/trusted-web-activity/">documentation</a>. The idea is to use a custom schema, like <code>my-app://</code> that is handled by an <a href="https://developer.android.com/reference/android/app/Activity">Android Activity</a>. This Activity will, in turn, launch the review flow and then finish itself. The reason we need a custom schema is that internal URLs will trigger navigation inside the Trusted Web Activity and we want to get the user back to the Android part of the app for the review flow. <h2>Step 1: Add the in-app-reviews dependency</h2> You will need to add the <code>com.google.android.play:core</code> dependency to <code>app/build.gradle</code>. After adding it, the <code>dependencies</code> section should look like the following: <pre style="background-color:#282a36;"> dependencies { implementation fileTree(include: ['*.jar'], dir: 'libs') implementation 'com.google.androidbrowserhelper:androidbrowserhelper:2.1.0' implementation 'com.google.android.play:core:1.9.0' } </pre> <h2>Step 2: Create a ReviewActivity</h2> Add a <code>ReviewActivity.java</code> file to the same folder where you will find <code>Application.java</code>, <code>LauncherActivity.java</code> and others, and implement in-app-reviews in this Activity. This Activity won't have any UI and you'll only use its <code>onCreate()</code> method to launch the review flow: <pre style="background-color:#282a36;"> package com.doom_fire.twa; import android.app.Activity; import android.os.Bundle; import android.util.Log; import androidx.annotation.NonNull; import androidx.annotation.Nullable; import com.google.android.play.core.review.ReviewInfo; import com.google.android.play.core.review.ReviewManager; import com.google.android.play.core.review.ReviewManagerFactory; import com.google.android.play.core.tasks.Task; public class ReviewActivity extends Activity { private static final String TAG = "ReviewActivity"; private ReviewManager mReviewManager; @Override protected void onCreate(@Nullable Bundle savedInstanceState) { super.onCreate(savedInstanceState); Log.d(TAG, "Review Activity started."); startReview(); } public void startReview() { mReviewManager = ReviewManagerFactory.create(this); Log.d(TAG, "Requesting Review flow."); Task<ReviewInfo> request = mReviewManager.requestReviewFlow(); request.addOnCompleteListener(task -> { if (task.isSuccessful()) { Log.d(TAG, "Review Flow request succeeded."); launchReviewFlow(task.getResult()); } else { Log.d(TAG, "Review Flow request failed. Finishing."); finish(); } }); } private void launchReviewFlow(@NonNull ReviewInfo reviewInfo) { Log.d(TAG, "Launching review flow."); Task<Void> reviewFlow = mReviewManager.launchReviewFlow(this, reviewInfo); reviewFlow.addOnCompleteListener(task -> { Log.d(TAG, "Review flow finished. Finishing Activity."); finish(); }); } } </pre> <h2>Step 3</h2> Add the Activity you just created to <code>AndroidManifest.xml</code>. In the example below, we are using <code>doom-fire</code> as the schema and <code>review</code> as the host. This means that, in our web application, we'll link to <code>doom-fire://review</code> to trigger the ReviewActivity. Make sure to change this to something that suits your app. <pre style="background-color:#282a36;"> <activity android:name=".ReviewActivity" android:theme="@android:style/Theme.Translucent.NoTitleBar"> <intent-filter> <action android:name="android.intent.action.VIEW"/> <category android:name="android.intent.category.DEFAULT" /> <category android:name="android.intent.category.BROWSABLE" /> <data android:scheme="doom-fire" android:host="review" /> </intent-filter> </activity> </pre> <h2>Step 4: Add a link to the web application:</h2> <pre style="background-color:#282a36;"> <a href="doom-fire://review">Rate app now!</a> </pre> <h2>The result</h2> <video src="/img/2021/01/in-app-review.mp4" controls poster="/img/2021/01/in-app-review-cover.png"></video> <h1>Caveats</h1> <ol> <li> The link above will only work when the web app is running inside a Trusted Web Activity. <a href="https://stackoverflow.com/questions/54580414/how-can-i-detect-if-my-website-is-opened-inside-a-trusted-web-actvity">This question</a> explains how to detect this. </li> <li> Due to the way Trusted Web Activities on ChromeOS works, this solution may not work on the platform. </li> <li> The app review flow will only work if the application has been deployed to the Play Store, which can making testing a bit tricky. </li> </ol> <h1>The future</h1> Reviews and ratings have been available to platform-specific developers for a long time and not only allow users to express their happiness (or the lack of it), but are also a tool that shortens the feedback cycle for developers and provides another metric businesses can use to benchmark against their competitors. It’s interesting to think what such tool would mean for the web. If you are interested in the subject, the <a href="https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/RatingsAndReviewsPrompt/explainer.md">Prompt for Rating/Review Explainer</a> proposes how a general API for this use-case could work.

Integrate in-app reviews into your Trusted Web Activity (TWA) using a custom schema. This tutorial shows you how to add the necessary Android dependency, create a ReviewActivity to handle the review flow, and add a link in your web app to trigger the review. Learn how to overcome caveats like ChromeOS compatibility and Play Store deployment requirements for testing.

Building Doom Fire using Modern JavaScript 2021-01-13T00:00:00Z https://bandarra.me/posts/Building-Doom-Fire-using-modern-JavaScript <doom-fire style="width: 100%;height: 20vh;display: block;background-color: black"></doom-fire> <a href="https://doom-fire.com/">Doom Fire</a> is an animated fire used by some ports of Doom, and documented in <a href="https://fabiensanglard.net/doom_fire_psx/">Fabien Sanglard's blogpost</a>. Despite the animation looking cool, the code is simple and ideal for learning graphic APIs. The <a href="https://developers.google.com/web/updates/2018/08/offscreen-canvas">Offscreen Canvas API</a> allows for moving the animation code to a Worker, allowing the main thread to worry about more important things - like handling user input! I've been looking into trying out the API for a while and the Doom Fire animation sounded simple enough to allow focusing on how Offscreen Canvas works. This post will focus on the Offscreen Canvas and modern JavaScript aspects for the code. I do recommend Fabien's <a href="https://fabiensanglard.net/doom_fire_psx/">blogpost</a> if you just want to learn more about the animation or go straight to the <a href="/static/doom-fire-animation.mjs">source code</a>. <h2>Browser support</h2> As most modern APIs, it's a good idea to start the work by checking browser support. The Offscreen Canvas API has good support and includes Chrome, Samsung Internet, Edge, and a couple other browsers. <del>But it's still missing in Safari and Firefox.</del> Update: Since 2023, Offscreen Canvas is supported by all major browsers and a <a href="https://web.dev/baseline">Baseline</a> feature. Thankfully, the API is easy to feature test: <pre style="background-color:#282a36;"> const canvas = document.querySelector('#canvas'); if ("OffscreenCanvas" in window) { // Offscreen Canvas code goes here! } </pre> <h2>Architecture considerations</h2> Since <code>Offscreen Canvas</code> is not supported by all browsers, we'll want to use <a href="https://en.wikipedia.org/wiki/Progressive_enhancement">progressive enhancement</a> and use the API when it is available - this means that the animation will run on a worker thread when <code>Offscreen Canvas</code> is available and on the main thread when it isn't. With this information, we now know we have to decouple the code that runs the animation from the code that sets up the Canvas and add it to a module that can be used from the main thread or from the worker: <pre style="background-color:#282a36;"> export default class DoomFireAnimation { constructor(parent, canvas) { this.parent = parent; this.canvas = canvas; this.ctx = canvas.getContext('2d'); ... // Finish setting up the animation. } start() { this.parent.requestAnimationFrame(this._update.bind(this)); } _update() { ... // Run the Doom Fire animation then render the next frame. this.parent.requestAnimationFrame(this._update.bind(this)); } } </pre> In the snippet above, the constructor gets two variables: <ol> <li>A reference to the context where the code is running, so we can call <code>requestAnimationFrame()</code> to render each frame. This will either be the <code>Window</code> object when the animation is running on the main thread or the <code>Worker</code> object when running off the main thread.</li> <li>A reference to the <code>Canvas</code> object that we will used to draw the animation.</li> </ol> The Worker implementation is straightforward: <pre style="background-color:#282a36;"> import DoomFireAnimation from './doom-fire-animation.mjs'; let doomFireAnimation; self.onmessage = function(ev) { if(ev.data.msg === 'init') { doomFireAnimation = new DoomFireAnimation(self, ev.data.canvas); } if (ev.data.msg === 'start') { if (doomFireAnimation) { doomFireAnimation.toggle().start(); } } } </pre> The <code>Worker</code> can handle two types of messages: one to prepare the animation and another message that starts it. Those messages could be merged into a single one, but having separate events comes handy when wrapping the animation into a web component. Finally, we can put everything together in the application: <pre style="background-color:#282a36;"> const canvas = document.querySelector('#canvas'); if ("OffscreenCanvas" in window) { const offscreenCanvas = this.canvas.transferControlToOffscreen(); this.worker = new Worker('doom-fire-worker.js'); this.worker.postMessage( {msg: 'init', canvas: offscreenCanvas}, [offscreenCanvas] ); this.worker.postMessage({msg: 'start'}); } else { this.animation = new DoomFireAnimation(window, this.canvas); this.animation.start(); } </pre> When Offscreen Canvas is available, <code>transferControlToOffscreen()</code> transfers control of the canvas and then is passed to a Worker. We then send an <code>init</code> message with a reference to the Offscreen Canvas and start the animation. When not available, the <code>DoomFireAnimation</code> is created in the main thread, with a reference to the Window object and the canvas. <h2>Wrapping everything in a Web Component</h2> <pre style="background-color:#282a36;"> export default class DoomFire extends HTMLElement { constructor() { super(); // Create our own Canvas! this.canvas = document.createElement('canvas'); this.offscreen = "OffscreenCanvas" in window; // Make the canvas use the whole element. this.canvas.style.width = '100%'; this.canvas.style.height = '100%'; if (this.offscreen) { console.log('Rendering with Offscreen Canvas.'); const offscreenCanvas = this.canvas.transferControlToOffscreen(); this.worker = new Worker('doom-fire-worker.js'); this.worker.postMessage( {msg: 'init', canvas: offscreenCanvas}, [offscreenCanvas] ); } else { console.log('Rendering with regular Canvas.'); this.animation = new DoomFireAnimation(window, this.canvas); } const shadowRoot = this.attachShadow({mode: 'open'}); shadowRoot.appendChild(this.canvas); } connectedCallback() { if (this.offscreen) { this.worker.postMessage({msg: 'start'}); } else { this.animation.start(); } } } if (!customElements.get('doom-fire')) { customElements.define('doom-fire', DoomFire); } </pre> And this is how the module can be added to the HTML: <pre style="background-color:#282a36;"> <!doctype html> <head> ... <script type="module" src="doom-fire.mjs"> ... </head> <body> <doom-fire></doom-fire> </body> </pre> <h2>Performance</h2> The Doom Fire animation is lightweight enough to run without effort in most devices, as it was originally created to run on devices like the PSX and the Nintendo 64. Still, checking out the fire charts on DevTools (pun intended) shows us how free the main thread gets with Offscreen Canvas: <table><thead><tr><th style="text-align: center">Main Thread</th><th style="text-align: center">Offscreen Canvas</th></tr></thead><tbody> <tr><td style="text-align: center"><img src="/img/2021/01/doom-fire-main-thread.jpg" alt="Main Thread fire chart" title="Main Thread fire chart" /></td><td style="text-align: center"><img src="/img/2021/01/doom-fire-worker.jpg" alt="Worker fire chart" title="Worker fire chart" /></td></tr> </tbody></table> <h2>Where to go next</h2> If you are a fan of the <a href="https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D">Canvas 2D API</a>, you may be interested to know that it is getting updates and improvements! Check out the recent Chrome Dev Summit by Aaron Krajeski talk to learn more! <iframe width="800" height="450" style="width:100%;" src="https://www.youtube.com/embed/dfOKFSDG7IM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> <script type=module src="/static/doom-fire.mjs"></script>

Create a mesmerizing Doom Fire animation in your browser using modern JavaScript and the Offscreen Canvas API. This tutorial shows you how to implement the animation efficiently, offloading it to a worker thread for optimal performance and main thread responsiveness. Learn progressive enhancement techniques to ensure broad browser compatibility. Improve your web development skills with this engaging project.

Building a Physical AirHorn Button with Web USB 2020-02-22T00:00:00Z https://bandarra.me/posts/Building-a-Physical-AirHorn-Button-with-Web-USB During the holiday season, I decided to experiment with building a physical button for <a href="https://twitter.com/Paul_Kinlan">Paul Kinlan’s</a> <a href="https://airhorner.com/">AirHorn</a>. This blogpost provides the instructions and links so you can also build your own <a href="https://wicg.github.io/webusb/">WebUSB</a> powered AirHorn Button. What you Will Need <ul> <li>An Arduino Device. I used an <a href="https://store.arduino.cc/arduino-nano-33-iot">Arduino Nano 33 IoT</a>.</li> <li>A Momentary button or Switch. I used this <a href="https://shop.pimoroni.com/products/massive-arcade-button-with-led-100mm-red">massive red button</a> from Pimoroni.</li> <li>A 10k Ohm Resistor</li> </ul> <h1>Getting Started</h1> WebUSB is an API that securely provides access to USB devices from Web Pages. The API has been around for a while, and hass been available in Chrome <a href="https://developers.google.com/web/updates/2017/09/nic61">since version 61</a>. I didn’t know anything about the API and the first step was to figure out how hard this would be. Fortunately, Francois Beaufort wrote a handy <a href="https://developers.google.com/web/updates/2016/03/access-usb-devices-on-the-web">getting started guide</a>. <h2>Hooking up a Button on the Arduino</h2> The button was hooked up to the Arduino Board exactly as shown in the <a href="https://www.arduino.cc/en/tutorial/button">Arduino Button Tutorial</a>. The implementation uses the <a href="https://github.com/webusb/arduino">Arduino WebUSB library</a>. Make sure to follow the steps to setup the library on your development computer. The Arduino implementation is a slightly modified verson of the Arduino Button Tutorial to implement something analog to <code>keydown</code> and <code>keyup</code> events. When the pin voltage is <code>HIGH</code>, it means the button is pressed. When it is <code>LOW</code>, it means the button is not pressed. To create the desired behaviour we want to check when the button state changes from <code>LOW</code> to <code>HIGH</code>, meaning the button was pressed, and from <code>HIGH</code> to <code>LOW</code>, meaning the button was released. We send an <code>ON</code> message via the serial bus when the button is pressed and an <code>OFF</code> message when it is released. Here's what the code looks like: <pre style="background-color:#282a36;"> #include <WebUSB.h> /** * Follow the instructions on https://github.com/webusb/arduino/ to install * the library and get the Arduino IDE to build and install it correctly. */ WebUSB WebUSBSerial(1 /* https:// */, "webusb-horn.firebaseapp.com"); #define Serial WebUSBSerial const int ledPin = 13; const int buttonPin = 2; int previousButtonState = 0; void setup() { while (!Serial) { ; } Serial.begin(9600); Serial.write("Sketch begins.\r\n> "); Serial.flush(); pinMode(ledPin, OUTPUT); pinMode(buttonPin, INPUT); } void loop() { if (Serial) { int buttonState = digitalRead(buttonPin); if (buttonState != previousButtonState) { if (buttonState == HIGH) { digitalWrite(ledPin, HIGH); Serial.write("ON\r\n"); } else { digitalWrite(ledPin, LOW); Serial.write("OFF\r\n"); } Serial.flush(); } previousButtonState = buttonState; delay(10); } } </pre> On the JavaScript side, we need to connect to the Arduino and then listen to messages on the serial interface. When an <code>ON</code> message is received, we start the AirHorn with <code>airhorn.start()</code>. When an <code>OFF</code> message is received, we stop it with <code>airhorn.stop()</code>; This is implemented in the <code>_loopRead</code> method in the code listing below. Check this <a href="https://github.com/andreban/airhorn/commit/299c8c1b4c1fd8a49b8db48a9add4864cc6259a3">commit</a> to see all changes made to AirHorn to make this work. <pre style="background-color:#282a36;"> const HardwareButton = function(airhorn) { this.airhorn = airhorn; this.decoder = new TextDecoder(); this.connected = false; const self = this; this._loopRead = async function() { if (!this.device) { console.log('no device'); return; } try { const result = await this.device.transferIn(2, 64); const command = this.decoder.decode(result.data); if (command.trim() === 'ON') { airhorn.start({loop: true}); } else { airhorn.stop(); } self._loopRead(); } catch (e) { console.log('Error reading data', e); } }; this.connect = async function() { try { const device = await navigator.usb.requestDevice({ filters: [{'vendorId': 0x2341, 'productId': 0x8057}] }); this.device = device; await device.open(); await device.selectConfiguration(1); await device.claimInterface(0); await device.selectAlternateInterface(0, 0); await device.controlTransferOut({ 'requestType': 'class', 'recipient': 'interface', 'request': 0x22, 'value': 0x01, 'index': 0x00, }); self._loopRead(); } catch (e) { console.log('Failed to Connect: ', e); } }; this.disconnect = async function() { if (!this.device) { return; } await this.device.controlTransferOut({ 'requestType': 'class', 'recipient': 'interface', 'request': 0x22, 'value': 0x00, 'index': 0x00, }); await this.device.close(); this.device = null; }; this.init = function() { const buttonDiv = document.querySelector('#connect'); const button = buttonDiv.querySelector('button'); button.addEventListener('click', this.connect.bind(this)); if (navigator.usb) { buttonDiv.classList.add('available'); } }; this.init(); }; </pre> This is how the result looks like: <video controls loop poster="/img/2020/02/airhorn.jpg"> <source src="/img/2020/02/airhorn.webm" type="video/webm; codecs=vp8"> <source src="/img/2020/02/airhorn_x264.mp4" type="video/mp4; codecs=h264"> </video> <h2>[Optional] Building a case for our button</h2> Finally, to make a nice packaging for the project, we can 3D print a box to fit our button and the required electronics inside. The design I used is <a href="https://www.thingiverse.com/thing:4088197">available at Thingiverse</a>, and it contains both the box and a lid where the button can be fitted. Here's an interesting video of the box being printed: <video controls loop muted poster="/img/2020/02/airhorn_box.jpg"> <source src="/img/2020/02/airhorn_box.webm" type="video/webm; codecs=vp8"> <source src="/img/2020/02/airhorn_box.mp4" type="video/mp4; codecs=h264"> </video> <h2>Final Result</h2> Finally, I used blue acrylic paint to give a nice color for the box. This is what the final result looks like: <img src="/img/2020/02/airhorn_final.jpg" alt="Complete AirHorn box" title="Complete AirHorn box" />

Build a WebUSB powered AirHorn button! This tutorial shows you how to build a physical button to control Paul Kinlan's AirHorn using an Arduino, a momentary button, and WebUSB. The guide includes code, wiring diagrams, and even 3D printing instructions for a custom case. Make your own fun, interactive project today!

Fitness with Web Bluetooth 2017-02-20T11:42:00Z https://bandarra.me/posts/Fitness-Tracking-with-Web-Bluetooth <img src="/img/2017/02/monitor.jpg" alt="PM 5 Monitor" title="Connecting to a PM5 Monitor" /> With the launch of Chrome 56, web applications are now able to access Bluetooth Low Energy devices directly from the browser, without the need to install a plugin or a native application. This opens the opportunity to create types of web applications that were only available to native platforms. For a great introduction on how to implement applications on the browser using Web Bluetooth, check François Beaufort's <a href="https://developers.google.com/web/updates/2015/07/interact-with-ble-devices-on-the-web">"Interact with Bluetooth Devices on the Web"</a> article. <h2>Building a Rowing Monitor</h2> Many fitness tracking applications track exercises by connecting to Bluetooth enabled devices, such as Health Monitors and Treadmills. Due to the lack of Bluetooth connectivity on the browser, those applications are, in most cases, developed using native platforms. With Web Bluetooth now being available, it becomes possible to connect and track exercises in real-time, from the browser. So, I decided to give it a try and build such application for my rowing machine. The machine in question is a Concept2 Model D, but the most important part is it's PM5 monitor. The monitor is BLE enabled, and, in fact, Concept2 offers native applications for both <a href="https://play.google.com/store/apps/details?id=com.concept2.ergdata">Android</a> and <a href="https://itunes.apple.com/gb/app/ergdata/id561716382?mt=8">iOS</a>.A quick search on Google took me to the <a href="http://www.concept2.co.uk/files/pdf/us/monitors/PM5_BluetoothSmartInterfaceDefinition.pdf">protocol</a> used by the monitor. The resulting application is hosted <a href="https://rowing-monitor.bandarra.me/">here</a>, and the source code is publicly available on <a href="https://github.com/GoogleChrome/rowing-monitor/">GitHub</a>. <h3>Connecting to the Monitor</h3> The PM5 specification outlines 4 different services on the device: The Information, Discovery, Control and Rowing services. The Discovery Service is the one that announces the device, so we need to pass it as a filter for the connection. <pre style="background-color:#282a36;"> const options = { filters: [{services: ['ce060000-43e5-11e4-916c-0800200c9a66']}], optionalServices: [ 'ce060010-43e5-11e4-916c-0800200c9a66', // Information Service. 'ce060020-43e5-11e4-916c-0800200c9a66', // Control Service. 'ce060030-43e5-11e4-916c-0800200c9a66' // Rowing Service. ] }; navigator.bluetooth.requestDevice(options) .then(device => { // ... }); </pre> The characteristics we want to use are accessed through the other services. In order to have access to those services later, we need to pass them as <code>optionalServices</code> when requesting the device. <h3>Selecting a device</h3> When the application requests the device, the browser will show a native interface, showing the devices that are compatible with the configuration requested. This dialog doubles both as a request for the application to access the bluetooth and a device selection screen! <img src="/img/2017/02/pm5.gif" alt="Choosing a PM5 Monitor" title="Choosing a PM5 Monitor" /> For developers who have gone through the process of developing a Bluetooth application on native platforms, this is very welcome news: On most platforms, developers have to build an interface that deals with device scans and handles the user selecting the devices by themselves. <h3>Accessing characteristics</h3> There are 2 ways to access characteristics. It's possible to request them at any time for the application and, more interesting to our objectives, sign up for notifications for when a characteristic changes. First, retrieve the Service this characteristic belongs to: <pre style="background-color:#282a36;"> device.gatt.connect() .then(server => { server.getPrimaryService('ce060030-43e5-11e4-916c-0800200c9a66') // Rowing Service. .then(service => { // ... }); }); </pre> Then, retrieve the characteristic itself. Call <code>characteristic.startNotifications</code> and then setup an <code>eventListener</code> to get updates on the characteristic. <pre style="background-color:#282a36;"> service.getCharacteristic('ce060031-43e5-11e4-916c-0800200c9a66') // General Info. .then(characteristic => characteristic.startNotifications()) .then(characteristic => { characteristic.addEventListener('characteristicvaluechanged', e => { // Parse characteristic value. }); }); </pre> Some characteristics will have values that are strings. The serialNumber characteristic, from the Information Service is one example. Here's how a developer can parse a characteristic like this. <pre style="background-color:#282a36;"> characteristic.addEventListener('characteristicvaluechanged', e => { const decoder = new TextDecoder('utf-8'); const value = decoder.decode(e.target.value); // ... }); </pre> On the Rowing Monitor, most characteristics are made of data. The specific format for that data is specified on the documentation, and it can can be accessed from the event object, in the form of a DataView. Here's how a developer can parse such data. <pre style="background-color:#282a36;"> characteristic.addEventListener('characteristicvaluechanged', e => { const dataView = e.target.value; const avgStrokeRate = dataView.getUint8(10); const endingHeartRate = dataView.getUint8(11); const averageHeartRat = dataView.getUint8(12); // ... }); </pre> <h2>Always test on multiple devices</h2> The Bluetooth LE stack doesn't support parallel access to the API. On some operating systems, the bluetooth stack implements a queue and serializes the calls to the API. But on other operating systems, such as Android, the OS does not queue the calls and it's up to the application developer to manage the queue. For more context around this, check this Github <a href="https://github.com/WebBluetoothCG/web-bluetooth/issues/188#issuecomment-255121220">issue</a>, where it's under active discussion. This leads to an interesting problem when developing using Web Bluetooth: An inadvertent developer may create an app and not care about serializing the calls to the API. The application would work flawlessly on a Mac OS, but fails on Android, with an error message that is not so obvious. Developers can implement their own queue to serialize API calls, or carefully design their application so that parallel calls don't happen. It's also important to test the application on multiple Operating Systems to make sure apps behave properly. <h2>Look mom, no internet!</h2> An application without access to the internet can be a dull application. But the Rowing Monitor takes advantages of features such as <a href="https://developers.google.com/web/fundamentals/getting-started/primers/service-workers">ServiceWorkers</a> and <a href="https://developer.mozilla.org/en/docs/Web/API/IndexedDB_API">IndexedDB</a> to create an application that fully works offline: Once the user visits the website for the first time, the service worker is installed and fully caches the application for offline usage. The Rowing Monitor's service worker is generated during build time, using the <a href="https://github.com/GoogleChrome/sw-precache/">sw-precache</a> library. Here's what the service worker generator task looks like in Gulp: <pre style="background-color:#282a36;"> gulp.task('generate-service-worker', callback => { const rootDir = 'dist'; swPrecache.write(path.join(rootDir, 'service-worker.js'), { staticFileGlobs: [rootDir + '/**/*.{js,html,css,png,jpg,gif,svg,eot,ttf,woff}'], stripPrefix: rootDir }, callback); }); </pre> The workouts are persisted using IndexedDB, so even if the user doesn't have connection the application is fully functional. The data is only persisted locally, but the application could be easily extended to offer the user the change to login and persist the workout data to a remote server, once it is online. <h2>Almost there!</h2> The Web Bluetooth API offers the functionality needed to connect and gather information from a Bluetooth enabled device, but there's still one thing missing to transform the Rowing Monitor into a fully functional fitness tracker. When using the application to track an exercise, the screen will go to sleep after a few seconds and the bluetooth connection will be lost. This is due to the lack of a Wakelock API. Fortunately, such a Web API is already <a href="https://www.w3.org/TR/wake-lock/">under discussion</a> on W3C. In fact, Chrome already implements an early version of the API, and it can be enabled by activating Chrome's experimental features. To activate it, go to chrome://flags/#enable-experimental-web-platform-features and click Enable. <pre style="background-color:#282a36;"> // The PM5 API returns a Promise when the connection is established. The WakeLock // is acquired once a connection is established. pm5.connect() .then(() => { screen.keepAwake = true; //... }); // The API also provides a disconnect event. The WakeLock is released once the // application is disconnected from the rowing machine. pm5.addEventListener('disconnect', () => { screen.keepAwake = false; //... }); </pre> <h2>Conclusion</h2> The potential of integrating fitness equipment with Web Bluetooth is amazing. Imagine someone arriving to the gym and instead of having to download a full app to track the exercise, they can just receive the URL through a Physical Web Beacon, scan a NFC tag or a QR code, and the application is made available instantly — no download involved. It also opens the possibility of deeper integrations. The Rowing Monitor application, for instance, could be evolved into an online racing application, a game or a VR experience that lets users row at different parts of the world. Another application worth checking is <a href="https://kinomap.tv/">https://kinomap.tv/</a>, which lets a user sync a YouTube riding video with a smart trainer, using Web Bluetooth.

Control your Concept2 PM5 rowing monitor directly from your web browser! This tutorial shows how to build a web app using Web Bluetooth to connect, track real-time workout data (stroke rate, heart rate), and store results locally, all without needing a native app. Learn how to handle device selection, access characteristics, parse data, and implement offline functionality with Service Workers and IndexedDB.

Ignoring Corrupted gzip files in Hadoop 2013-12-09T00:00:00Z https://bandarra.me/posts/Ignoring-Corrupted-gzip-files-in-Hadoop I've been analyzing my website traffic using Hadoop and MapReduce. Our logs are recorded hourly on a gzipped file. But, since the server may be restarted while writing to the file, every now and then a file gets corrupted. When this happens the default Hadoop implementation aborts the entire job. So, I had to dive into the Hadoop source code and find a way to make it more lenient towards corrupted files. The trick is to create a LineRecordReader that, instead of raising the EOFException, catches it and tells that there are no more lines to read in the file. As the default TextInputFormat has a hardcoded LineRecordReader, it is necessary to extend the FileInputFormat and override the createRecordReader method to return my version of FileInputFormat. Here's what the code looks like: <pre style="background-color:#282a36;"> package org.bandarra.hadoop; import org.apache.commons.compress.utils.Charsets; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.LineRecordReader; import java.io.EOFException; import java.io.IOException; /** * Created by andreban on 12/9/13. */ public class LenientTextInputFormat extends FileInputFormat { private static class LenientLineRecordReader extends LineRecordReader { public LenientLineRecordReader(byte[] recordDelimiter) { super(recordDelimiter); } @Override public boolean nextKeyValue() throws IOException { try { return super.nextKeyValue(); } catch(EOFException ex) { ex.printStackTrace(); return false; } } } @Override public RecordReader createRecordReader( InputSplit split, TaskAttemptContext context) { String delimiter = context.getConfiguration().get("textinputformat.record.delimiter"); byte[] recordDelimiterBytes = null; if (null != delimiter) { recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); } return new LenientLineRecordReader(recordDelimiterBytes); } } </pre>

Handle corrupted Hadoop gzipped log files gracefully. This improved `FileInputFormat` extends Hadoop's default functionality, enabling your MapReduce jobs to continue processing even when encountering corrupted hourly log files, preventing job aborts due to `EOFException`. The solution uses a custom `LineRecordReader` to handle exceptions, ensuring data processing continues uninterrupted.

OpenGL SuperBible in Java: Putting things in perspective! 2011-12-19T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-Putting-things-in-perspective Until now, all our examples were done using the orthographic mode. This means that we haven't seen any example that gives us a sense of depth. Ortographic mode is actually very useful if you want to make 2D games in OpenGL or draw a HUD for your game. But what most developers really want is to make 3D games, with an awesome sense of depth. The 2 images below picture the same scene rendered in orthographic mode, and then in perspective mode. <img src="/img/2011/12/Orthographic.png" alt="Orthographic" title="Orthographic" /> <img src="/img/2011/12/Perspective.png" alt="Perspective" title="Perspective" /> For this example, i've included the same scene rendered in orthographic and perspective mode. Check the examples on the example6 package of the source. I won't detail the ortographic mode example, because its not much different from previous examples. The only new class in this example is the <code>GLFrustrum</code> class. Its just a simple wrapper that manipulates the projection matrix with some utility methods. The relevant method for this example is setPerspective method. The image below helps to understand the parameter values. The first parameter is the field of view, followed by the aspect ratio, and then by the value of the near plane and then the far plane. So, using the <code>GLFrustrum</code> is the first difference in this example. The other on is that we have another <code>MatrixStack</code> declared, the perspective matrix. It is build from the <code>GLFrustrum</code> matrix and should be changed everytime he values in the <code>GLFrustrum</code> change. This is how the initGL method looks like now. <pre style="background-color:#282a36;"> public void initGL() { glClearColor(0.0f,0.0f,0.0f,0.0f); shader = GLShaderFactory.getFlatShader(); sideWall = GLBatchFactory.makeCube(0.2f, 0.8f, 1.0f); topWall = GLBatchFactory.makeCube(0.8f, 0.2f, 1.0f); frustrum = new GLFrustrum(); modelViewMatrix = new MatrixStack(); } </pre> And this is how the resizeGL method looks like: <pre style="background-color:#282a36;"> public void resizeGL() { glViewport(0,0,Display.getWidth() ,Display.getHeight()); frustrum.setPerspective(45f, Display.getWidth()/ Display.getHeight(), 1.0f, 10.0f); projectionMatrix = new MatrixStack(frustrum.getProjectionMatrix()); } </pre> The change in the initGL method is straightforward. The significant change (besides initalizing scene specific stuff) is the addition of the code creating the <code>GLFrustrum</code>. But the <code>resizeGL</code> method packs more interesting stuff. besides calling glViewPort, it configures a <code>45</code> degree fov for the frustrum, with near plane of <code>1.0</code> and far plane of <code>10.0</code>, which means that anything outside those points wont be drawed. The code for drawing the scene is also straightforward. This time, we need to multiply the <code>ModelView</code> matrix by the <code>ProjectionMatrix</code> so as to get the projected scene. The other change you should notice is that when translating the objects in the scene, we use a <code>-2</code> on the <code>Z</code> axis. Thats related to the configuration of the near and far plane we mentioned before. So, that's it for drawing a projected scene! Again, all the code is available at http://code.google.com/p/opengl-superbible-java/

Learn to create stunning 3D visuals in OpenGL using perspective projection. This tutorial shows you how to switch from orthographic to perspective mode, leveraging the `GLFrustrum` class and matrix manipulation for depth and realism in your game or application. Master the `setPerspective` method and understand the impact of field of view, aspect ratio, near and far planes. Enhance your OpenGL skills now!

OpenGL SuperBible in Java: The GLBatchFactory 2011-12-12T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-GLBatchFactory <img src="/img/2011/12/GLBatchFactory.png" alt="Different 3D Shapes" title="Different 3D Shapes" /> The GLBatchFactory is a helper class that helps the developer to create models. Besides providing a tool to create more complex model, where the developer may chose to add each vertex of the triangle separately, it provides 5 static methods that create some useful shapes. The first shape is the cube, created with the makeCube method. It takes 3 float parameters, the cube width, height and depth. The cube is created with the point 0.0.0 on its center; The second shape is the sphere. It also takes 3 float parameters, but with different meaning. The first is da radius of the sphere. Now, the sphere is constructed of triangles organized in slices and stacks. The second and third parameters determine the number of slices and stacks of the sphere.. The more slices and stacks, the better the sphere will look. But slower to build and render it will be. The sphere is constructed around 0.0.0 too. The third shape is the cylinder (or cone). It takes 5 parametres. The radius of the top, the radius of the bottom, the length, the number of slices and number of stacks. The fourth shape is the disk. A shape that resembles a CD. It takes four parameters, the inner radius (size of the middle of the disk), the outer radius, number of stacks and slices. The fifth shape is the Torus, a shape that looks like a donut. It takes 4 parameters, the inner radius, the outer radius, number of stacks and number of slices. On the OpenGL Superbible C++ code, this code is inside the GLBatch class. I decided to put it on another class, just to make the code more clear. Again, the example is available at the Example5.java, on http://code.google.com/p/opengl-superbible-java/ If you want to check out the older tutorials, go to http://www.codemansion.com/p/opengl-superbible-in-java-using-lwjgl.html

Learn to create 3D shapes like cubes, spheres, cylinders, disks, and tori using the GLBatchFactory helper class. This tutorial provides five static methods with parameters to easily generate these common shapes in your OpenGL projects, improving your model-building efficiency. Explore the example code and enhance your 3D graphics development skills.

OpenGL Superbible in Java, the MatrixStack 2011-12-09T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-The-MatrixStack On the last example, we learned how to rotate/translate a triangle manipulating the matrices. There was a lot of code for such a small feature. Imagine if you hat to rotate various objects in different places of the screen. Thats where the <code>MatrixStack</code> class comes to help. It encapsulates the feature of the old OpenGL 1.1 where you could push, pop and manipulate the matrix state, demanding much less code if you want to draw various objects. As the first example, lets modify our RotatingTriangle to use the <code>MatrixStack</code>, on Example3.java. The first step is creating an instance variable to hold our <code>MatrixStack</code>. Let's also move our FloatBuffer from inside the update method to an instance variable; <pre style="background-color:#282a36;"> private MatrixStack matrixStack = new MatrixStack(); private FloatBuffer buff = BufferUtils.createFloatBuffer(16); public void render() { angle += 1f; glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); shader.useShader(); shader.setUniform4("vColor", 1.0f, 0.0f, 0.0f, 1.0f); matrixStack.push(); matrixStack.rotate(angle, 0.0f, 0.0f, 1.0f); matrixStack.fillBuffer(buff); shader.setUniformMatrix4("mvpMatrix", false, buff); triangleBatch.draw(shader.getAttributeLocations()); matrixStack.pop(); Display.update(); } </pre> Now, lets see what happens on the render method. Instead of declaring and multipying the matrices by hand, we just push the matrix, manipulate it as we want by calling the <code>matrix.translate()</code> and <code>matrix.rotate()</code> methods, then we call the <code>fillBuffer()</code> method to put the result matrix in the buff variable and <code>draw()</code>. The last important thing is to call the <code>pop()</code> method so that we end up with the same matrix that we started. Again, the code is available at http://code.google.com/p/opengl-superbible-java/. Just look for Example3.java. Now, to show how simple it is to draw various objects on the screen, each one at its own place and rotation, let's check what happens on Example4.java. The first change is on the <code>initGL()</code> method. We are now using a smaller triangle. <pre style="background-color:#282a36;"> triangleBatch = new SimpleGLBatch(GL11.GL_TRIANGLES, new float[]{ 0.0f, 0.3f, 0.0f, 1.0f, -0.3f, -0.3f, 0.0f, 1.0f, 0.3f, -0.3f, 0.0f, 1.0f}, new short[]{0, 1, 2}); </pre> Now, lets see what happens on the <code>render()</code> method. <pre style="background-color:#282a36;"> public void render() { angle += 1f; glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); shader.useShader(); matrixStack.push(); shader.setUniform4("vColor", 1.0f, 0.0f, 0.0f, 1.0f); matrixStack.translate(-0.5f, 0, 0); matrixStack.rotate(angle, 0.0f, 0.0f, 1.0f); matrixStack.fillBuffer(buff); shader.setUniformMatrix4("mvpMatrix", false, buff); triangleBatch.draw(shader.getAttributeLocations()); matrixStack.pop(); matrixStack.push(); shader.setUniform4("vColor", 0.0f, 0.0f, 1.0f, 1.0f); matrixStack.translate(0.5f, 0, 0); matrixStack.rotate(angle, 0.0f, 1.0f, 0.0f); matrixStack.fillBuffer(buff); shader.setUniformMatrix4("mvpMatrix", false, buff); triangleBatch.draw(shader.getAttributeLocations()); matrixStack.pop(); Display.update(); } </pre> For each triangle we want to draw, we call matrix.push() before translating and rotating, fill the buffer, draw the triangle and call pop. You could easily have a loop in your code to draw hundreds of objects this way!!

Simplify OpenGL rendering with MatrixStack. Learn to efficiently rotate and translate multiple objects using push/pop matrix operations, minimizing code and maximizing performance. See examples and code for easy implementation.

OpenGLSuperbible in Java, rotating our triangle! 2011-12-04T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-Rotating-our-triangle On the last post of our lwjgl series, we saw how to draw a triangle. Now, lets learn how to make our triangle rotate around the Z axis. The first thing we need is to use a different shader so that we may pass the <code>modelView</code> matrix to the shader and then use the shader to update the vertex positions. While the identity shader just passed on the vertex position, the flat shader multiplies the vertex position as a matrix passed by a unifor, called <code>mvpMatrix</code>. Thats exactly what we need. Let's change the code on the initGL method to do that <pre style="background-color:#282a36;"> shader = GLShaderFactory.getFlatShader(); </pre> Next, on the <code>render()</code> method, we need to pass the modelView matrix as a uniform to the shader. The first step is creating the variables to hold the matrix: <pre style="background-color:#282a36;"> float[] modelViewMatrix = new float[16]; float[] translationMatrix = new float[16]; float[] rotationMatrix = new float[16]; </pre> Now, lets fill the matrix. We don't want to move the triangle in the scene at this moment, so, lets create the translation matrix filled with zeroes <pre style="background-color:#282a36;"> Math3D.translationMatrix44f(translationMatrix, 0.0f, 0.0f, 0.0f); </pre> For the rotation matrix, let's rotate it around the <code>Z</code> axis. <pre style="background-color:#282a36;"> Math3D.rotationMatrix44(rotationMatrix, angle, 0.0f, 0.0f, 1.0f); </pre> The angle variable has been created with class scope and is updated every time the <code>render()</code> method runs. Now, we need to multiply our matrices and use it as a uniform in our shade. <pre style="background-color:#282a36;"> Math3D.matrixMultiply44(modelViewMatrix, translationMatrix, rotationMatrix); FloatBuffer buff = BufferUtils.createFloatBuffer(16); buff.put(modelViewMatrix); buff.flip(); shader.setUniformMatrix4("mvpMatrix", false, buff); </pre> After multiplying, we need to create a <code>FloatBuffer</code> to add the shader as a uniform. Then, all we need is to call the <code>setUniformMatrix4</code> on the shader instances and we are all set. Thats it for rotation a triangle. You can also change this code to rotate it around other axis or move it around the scene. Again, all the code is available at http://code.google.com/p/opengl-superbible-java

Learn how to rotate a triangle in Java using LWJGL and shaders. This tutorial shows you how to create and use rotation matrices, pass them to shaders as uniforms, and update vertex positions for smooth rotation around the Z-axis. Improve your 3D graphics skills with this step-by-step guide and example code.

OpenGL SuperBible in Java: Your first triangle 2011-11-11T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-Your-First-Triangle On the last tutorials, there was a lot of code that we used to build a framework and encapsulate the complexity of the shaders. Today, it's time for that work to start paying off and getting something actually drawn on our screen! Besides using the code on OpenGL Superbible, the code in this article was inspired by the tutorials on <a href="http://lwjgl.org/wiki/index.php?title=Main_Page">LWJGL's wiki page</a> and a few other searches on Google, to figure out the LWJGL specific parts, like initalizing screen and etc. Again, all the code is available on <a href="http://code.google.com/p/opengl-superbible-java/">http://code.google.com/p/opengl-superbible-java/</a> First, i won't cover the details on LWJGL's implementation. The code is fairly simple and their wiki and docs should clear any doubts. Taking out the LWJGL initialization code, the example turns out to be very short. There are two instance variables that are important to our example in the Triangle class. The triangleBatch and shader instances. The first is a <code>GLBatch</code> which has the responsability of drawing our triangle. The second is shader used to draw this triangle. Those variables are initialized on the <code>initGL</code> method. Heres a transcription of the code: <pre style="background-color:#282a36;"> public void initGL() { glClearColor(0.0f,0.0f,0.0f,0.0f); shader = GLShaderFactory.getIdentityShader(); triangleBatch = new SimpleGLBatch(GL11.GL_TRIANGLES, new float[]{ 0.0f, 0.5f, 0.0f, 1.0f, //vertex 1 -0.5f, -0.5f, 0.0f, 1.0f, //vertex 2 0.5f, -0.5f, 0.0f, 1.0f}, //vertex 3 new short[]{0, 1, 2}); } </pre> The <code>glClearColor</code> call specifies which color to using when clearing the color buffers. You can get more details here. The shader initialization just uses the default Identity Shader from <code>GLShaderFactory</code>. This shader does not make any transformation on the vertex. The next line initializes the GLBatch informing that it should use <code>GL_TRIANGLES</code> to draw the vertices, the vertices values. Each vertex has 4 float values. <code>x</code>, <code>y</code>, <code>z</code> and <code>scale</code>, and, at last, the indexes of the vertices. The next important line of code is: <pre style="background-color:#282a36;"> public void resizeGL() { glViewport(0,0,DISPLAY_WIDTH ,DISPLAY_HEIGHT); } </pre> This code simply resizes the OpenGL viewport when the window size is changed. Now, we have to actually draw the triangle on the screen, and thats what the render method does: <pre style="background-color:#282a36;"> public void render() { glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); shader.useShader(); shader.setUniform4("vColor", 1.0f, 0.0f, 0.0f, 1.0f); triangleBatch.draw(shader.getAttributeLocations()); Display.update(); } </pre> The first step is clearing the screen, with a call to <a href="http://www.opengl.org/sdk/docs/man/xhtml/glClear.xml"><code>glClear</code></a>. Then, we tell the shader we want to use it, with <code>shader.useShader()</code>. The next step is telling the shader which color we want to paint our triangle. The identity shader user a uniform called vColor to do that. The, we can draw the triangleBatch and ask LWJGL to swap the screen buffer with <code>Display.update()</code>. Thats it, we got a triangle on our screen. <img src="/img/2011/11/triangle.png" alt="A red triangle" title="The resulting triangle" /> You may notice that if you change the screen size, the triangle will change its shape. That's because we are using the identity shader, which maps the viewport to coordinates between <code>-1.0</code> and <code>1.0</code>. On the next tutorials, we will see how to draw our triangle without losing the proportion.

Learn to draw a triangle in Java using OpenGL and LWJGL. This tutorial provides a concise code example demonstrating how to set up a basic OpenGL environment, initialize shaders, and render a red triangle to the screen. The code covers clearing the color buffer, using an identity shader, and updating the display. Improve your understanding of OpenGL fundamentals and begin creating your own 2D graphics.

OpenGL SuperBible in Java: The GLBatch class 2011-11-01T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-The-GLBatch On the last tutorials, we saw information about the <code>GLShaderManager</code> class. But creating a shader is just the first step to be able to render your scene. The second step is passing geometry data to your shaders. And that's where the GLBatch class comes to help. As i did with the <code>GLShaderManager</code> class, i broke the GLBatch class into more than 1 class. The first one, <code>GLBatch.java</code> is just an interface with a single method, draw, that receives a single paramater, a Map that contains pointers to the attribute locations. The <code>GLBatch</code> does not know the <code>GLShader</code> class, making the design decoupled. The implementation class, where the real business happens is the <code>SimpleGLShader</code>. This class receives the geometry data in its contructors and uses Buffer Objects to hold the data. There are 2 constructors available. The fist one represents the minimal data needed to create a SimpleGLBatch, which are the vertex array, the element index array and the mode that must be used to create the triangles (<code>GL_TRIANGLES</code>, <code>GL_TRIANGLE_FAN</code>, <code>GL_TRIANGLE_STRIP</code>, etc). The second constructor has all the data supported by the class. Besides the mode, vertex array and index array, this constructor may receive, the color array, normal array and texture array. The last 3 ones may null. The first constructor is actually a shortcurt for the 2nd constructor. Half of the magic from the <code>SimpleGLBatch</code> happens in the constructor. The other half happens inside the draw method. For each array of data that is not null in the constructor, we have to build OpenGL buffers using <code>genBuffer</code>. Here's a sample of the code: <pre style="background-color:#282a36;"> if (vColorData != null && vColorData.length > 0) { FloatBuffer colorData = BufferUtils.createFloatBuffer(vColorData.length); colorData.put(vColorData); colorData.flip(); colorBuffer = GL15.glGenBuffers(); GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, colorBuffer); GL15.glBufferData(GL15.GL_ARRAY_BUFFER, colorData, GL15.GL_STATIC_DRAW); } </pre> This method creates a <code>FloatBuffer</code> from the the array, then generates a gl buffer. The last step is filling the GL Buffer with data. Repeat this code for the <code>vertexData</code>, <code>normalData</code> and <code>textureData</code>. The only difference is the index array, which has a similar code, but instead of binding to the <code>GL_ARRAY_BUFFER</code>, binds to the <code>GL_ELEMENT_ARRAY_BUFFER</code>. In the draw method, we draw using the buffers created in the constructor. Heres the code: <pre style="background-color:#282a36;"> if (attributeLocations.containsKey("inColor") && colorBuffer >= 0) { GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, colorBuffer); int colorLocation = attributeLocations.get("inColor"); GL20.glVertexAttribPointer(colorLocation, 4, GL11.GL_FLOAT, false, 4 * 4, 0); GL20.glEnableVertexAttribArray(colorLocation); } </pre> Again, the only difference is for the index array: <pre style="background-color:#282a36;"> GL15.glBindBuffer(GL15.GL_ELEMENT_ARRAY_BUFFER, indexBuffer); </pre> The last step is drawing the elements with a call to <code>glDrawElements</code> <pre style="background-color:#282a36;"> GL11.glDrawElements(mode, numElements, GL11.GL_UNSIGNED_SHORT, 0); </pre> An observation is that the shaders must have a standard name for the attributes. inVe<code>rtex, </code>inColor<code>, </code>inNormal<code>and</code>inTexCoord` for the vertex position, color, normal and texture coordinate. Again, all the code is available at <a href="http://code.google.com/p/opengl-superbible-java/">http://code.google.com/p/opengl-superbible-java/</a>

Learn how to use the GLBatch class in OpenGL ES 2.0 to efficiently pass geometry data to your shaders. This tutorial covers creating vertex, color, normal, and texture buffers, explains the two-constructor approach for flexible data handling, and provides Java code examples demonstrating buffer creation and drawing using `glDrawElements`. Master efficient OpenGL rendering techniques now!

OpenGL SuperBible in Java: The GLShaderManager class - Part2 2011-10-25T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-The-GLShaderManager-Part-2 In the <a href="/2011/10/24/OpenGL-Superbible-in-Java-The-GLShaderManager/">last post</a>, we saw how to adapt the <code>GLShaderManager</code> from the C code to Java. But the C code from the book actually has 2 responsabilities, the first is actually being a GLShader, and the second is being a factory to create several default Shaders. In the Java implementation, I decided to separate those responsabilities and created a class <code>GLShaderFactory</code> that has method to create the default shaders. Theres no need to go in detail. The shaders are the same from the book. The code for the <code>GLShaderFactory.java</code> is available <a href="http://code.google.com/p/opengl-superbible-java/source/browse/OpenGLSuperBible/src/openglsuperbible/glutils/GLShaderFactory.java">here</a>

Learn how to separate GLShader and GLShaderFactory responsibilities in Java for efficient OpenGL programming. This tutorial builds upon a previous post adapting C code to Java, offering a cleaner, more organized approach to creating default shaders. Get the source code now!

OpenGL SuperBible in Java: The GLShaderManager class 2011-10-24T00:00:00Z https://bandarra.me/posts/OpenGL-Superbible-in-Java-The-GLShaderManager When I started using the OpenGL SuperBible book to learn OpenGL, I noticed that the first examples from the book used some classes made by the author to encapsulate some complexity that would be explained later. As I was translating the examples do Java code, I had to figure out what those classes did before reading and testing the examples from the book. One of those classes is the GLShaderManager, which I actually transformed in 2 classes. The first is the <code>GLShader.java</code>. The objective of <code>GLShader.java</code> is to receive 2 shader programs: a vertex program and a fragment program, and compile the programs, link them and extract the uniform and attribute ids into a Map on the client side for easier use later. The code in the example is available at: <a href="http://code.google.com/p/opengl-superbible-java/">http://code.google.com/p/opengl-superbible-java/</a> I rewrote the examples using <a href="https://www.lwjgl.org/">LWJGL</a>. The main tricks of the class are on the constructor. It receives 2 strings as parameters, the <code>vertexShaderSource</code> and the <code>fragmentShaderSource</code>. So, the first step is compiling the shader sources. The code for compiling the Vertex Shader is as follows: <pre style="background-color:#282a36;"> int vertexShader = GL20.glCreateShader(GL20.GL_VERTEX_SHADER); GL20.glShaderSource(vertexShader, vertexShaderSource); GL20.glCompileShader(vertexShader); String vertexShaderErrorLog = GL20.glGetShaderInfoLog(vertexShader, 65536); if (vertexShaderErrorLog.length() != 0) { System.err.println( "Vertex shader compile log: \n" + vertexShaderErrorLog); } </pre> First, a shader pointer is created with <code>glCreateShader</code>. The parameter indicates which type of shader to create. The second line attaches the shader source the the shader id. And, on the 3rd line, the shader is compiled. The rest of the code checks if the compilation of the shader was OK and displays a message if something went wrong. (May be a good place to throw an exception). Next thing to do is compiling the Fragment Shader. <pre style="background-color:#282a36;"> int fragmentShader = GL20.glCreateShader(GL20.GL_FRAGMENT_SHADER); GL20.glShaderSource(fragmentShader, fragmentShaderSource); GL20.glCompileShader(fragmentShader); String fragmentShaderErrorLog = GL20.glGetShaderInfoLog(fragmentShader, 65536); if (fragmentShaderErrorLog.length() != 0) { System.err.println("Fragment shader compile log: \n" + fragmentShaderErrorLog); } </pre> The code is almost the same as compiling the vertex shader, but passing <code>GL_FRAGMENT_SHADER</code> as a parameter to <code>glCreateShader</code>. Now that we've compiled both the vertex shader and fragment shader, we have to link them together in a program. <pre style="background-color:#282a36;"> program = GL20.glCreateProgram(); GL20.glAttachShader(program, vertexShader); GL20.glAttachShader(program, fragmentShader); GL20.glLinkProgram(program); String log = GL20.glGetProgramInfoLog(program, 65536); if (log.length() != 0) { System.err.println("Program link log:\n" + log); } </pre> The first line creates a pointer to the program. Then, both the vertex shader and fragment shaders are attached to the program. The last step is to link them. Again, after linking the program we check if something went wrong. By now, your shaders are compiled, linked and ready to use. But we may do something else. Each uniform or attribute created on your vertex and shader programs receives a pointer to be use the client code. But you have to find out which is which. Heres the code to identify the attribute ids: <pre style="background-color:#282a36;"> int numAttributes = GL20.glGetProgram(program, GL20.GL_ACTIVE_ATTRIBUTES); int maxAttributeLength = GL20.glGetProgram(program, GL20.GL_ACTIVE_ATTRIBUTE_MAX_LENGTH); for (int i = 0; i < numAttributes; i++) { String name = GL20.glGetActiveAttrib(program, i, maxAttributeLength); int location = GL20.glGetAttribLocation(program, name); System.out.println(name + ":" + location); attributeLocations.put(name, location); } </pre> First, we find out how many attributes we have. Then the size of the larges attribute name. Then, we loop for each attribute, get its name and its pointer and put it inside a map. The code for the uniform IDs is almost the same: <pre style="background-color:#282a36;"> int numUniforms = GL20.glGetProgram(program, GL20.GL_ACTIVE_UNIFORMS); int maxUniformLength = GL20.glGetProgram(program, GL20.GL_ACTIVE_UNIFORM_MAX_LENGTH); for (int i = 0; i < numUniforms; i++) { String name = GL20.glGetActiveUniform(program, i, maxUniformLength); int location = GL20.glGetUniformLocation(program, name); uniformLocations.put(name, location); System.out.println(name + ":" + location); } </pre> The code to activate the shader is pretty simple: <pre style="background-color:#282a36;"> public void useShader() { //Enable shader GL20.glUseProgram(program); } </pre> And this is how we encapsulate setting a uniform value. There, we can see the Map we made at the constructor in action. We could opt to not creating the map and getting the location on the fly on this method. <pre style="background-color:#282a36;"> public void setUniformMatrix4(String uniformName, boolean traverse, FloatBuffer matrixdata) { int location = uniformLocations.get(uniformName); GL20.glUniformMatrix4(location, traverse, matrixdata); } </pre> Thats it. Thats the first part of my port of the GLShaderManager class to Java using lwjgl. I've also ported this code to Android (2.2+) with minimal changes.

Learn to create and manage shaders in Java using LWJGL. This tutorial shows you how to compile vertex and fragment shaders, link them into a program, and retrieve attribute and uniform locations for efficient use in your OpenGL applications. The code examples cover shader compilation, linking, and utilizing a map for easy access to shader locations. Improve your OpenGL development skills today!

VBOs on Android 2.2 2011-10-13T00:00:00Z https://bandarra.me/posts/VBOs-on-Android-2-2 Android 2.2 was the first Android version to support OpenGL ES 2.0. The problem is that it came out with a big flaw. It does not support VBOs. The advantage of using VBOs is that besides having improved performance, the memory used for those objects do not count for the app used heap size. So, the developer is able to have more and more complex models in the app. The issue has been fixed on Android 2.3+, but for devs who still want this feature on 2.2, i found this post: <a href="http://code.google.com/p/android/issues/detail?id=8931">http://code.google.com/p/android/issues/detail?id=8931</a> Basically, you can use JNI to access the function that is not available on Android!

Bypass Android 2.2 OpenGL ES 2.0 VBO limitations using JNI for enhanced performance and increased model complexity in your apps. Learn how to overcome this flaw and unlock access to features unavailable in older Android versions.

Books on Java Game Programming 2011-10-09T14:37:00Z https://bandarra.me/posts/Books-on-Java-Game-Programming <h1>My Journey into Game Programming</h1> I started to learn about game programming as probably most people do: by using Google and finding information about it. It was enough for me to put a few games together and be happy with them. But when I decided that I wanted to explore the subject further, I started to search for books. I preferred books written in Java for two main reasons. The first is that I'm well experienced in Java. The second, and most important, I wanted to port the knowledge gained from those books to the Android Platform, which also used Java. I primarily searched on Amazon and read the reviews there. I bought two books on the subject and was very satisfied with them. <img src="https://lh5.googleusercontent.com/proxy/IX4EgILWbQqGXUWe5rZ7QW6ghdRiG_wXFXRilGizyt5wnjfIEoTo9YmUey5XiIUhCT8rSCjONgopFZRPj1U-G-p7VjK5Kas3NsHFMotUPFKNlwG6NQCeqXxKk_D0aaWfqmbGvS4OjN3ANUEYVH4CyXsW_pmXkZeeXLNtOYiBSmpwBtTONHiUZrhQI66zNSJI7uyitixQjg=s0-d" alt="Killer Game Programming in Java" title="Killer Game Programming in Java book cover" /> The first book I got was <a href="http://www.amazon.com/Killer-Game-Programming-Andrew-Davison/dp/0596007302/ref=sr_1_1?ie=UTF8&qid=1318079333&sr=8-1">Killer Game Programming in Java</a>, by Andrew Davison, published by O'Reilly. Although I had the impression that the book is not the best example of OO programming and code reuse, it contains a lot of great examples of how to implement things in Java. The chapter about coding the game loop alone is worth the price of the book. Actually, it was the base for my own game loop code. There are many cool examples using Java 2D. The downside of the book is that the chapters on 3D rely heavily on Java 3D, which is OK if you want to use that API in your code. But I wanted things at a "lower level" so that I could port to Android. The chapters on AI and Pathfinding are great, too! <img src="https://lh6.googleusercontent.com/proxy/VGC7XpKk3SYwe0ErFf92hio6hzB-XZZcSTWyjuIn-4tpxeWfOrnf4ba9pO4UniONbsfqYgXlWGDGZWSEPicV95ioQi2Sj6YD5tLTdGvxx9kR5Oe0uW8_=s0-d" alt="Developing Games in Java" title="Developing Games in Java book cover" /> The second book was <a href="http://www.amazon.com/Developing-Games-Java-David-Brackeen/dp/1592730051/ref=sr_1_1?ie=UTF8&qid=1318079346&sr=8-1">Developing Games in Java</a>, by David Brackeen. This book is somewhat different from the first one. So, I wouldn't recommend buying one or the other, they actually complement each other. This one is more object-oriented, so it provides better insights into actually building a reusable game engine, which is, in fact, the stronger point of this book. The 3D chapters are awesome. The author teaches the reader how to code a 3D software renderer from scratch in Java. Although the software renderer is not very useful in the real world, it gives the reader a lot of knowledge on how hardware renderers actually work, making it much easier to learn and work with them.

Learn Java game programming with these two recommended books. "Killer Game Programming in Java" excels at practical examples and game loop coding, while "Developing Games in Java" focuses on object-oriented design and building a 3D software renderer from scratch, providing valuable insights into game engine architecture and hardware rendering. Both books are great resources for Android game development.

OpenGL 2011-09-25T10:02:00Z https://bandarra.me/posts/OpenGL <img src="https://www.opengl.org/archives/resources/features/KilgardTechniques/LensFlare/glflare1.gif" alt="OpenGL" title="The OpenGL logo" /> Since I started coding games for Android, I wanted to learn OpenGL. Besides enabling the development of 3D games, OpenGL is supposed to give the app a performance boost on 2D games. Android has OpenGL ES bindings for OpenGL ES 1.0 and OpenGL ES 2.0 (if you are on Android 2.2+). Since I didn't know much about OpenGL coding, I searched for good books on the subject. I found <a href="http://www.amazon.com/OpenGL-SuperBible-Comprehensive-Tutorial-Reference/dp/0321712617/ref=sr_1_1?ie=UTF8&qid=1316983191&sr=8-1">OpenGL SuperBible</a>, which covers OpenGL 3.3 Core Profile and a bit about OpenGL ES 2.0 and <a href="http://www.amazon.com/OpenGL-ES-2-0-Programming-Guide/dp/0321502795/ref=sr_1_1?ie=UTF8&qid=1316983205&sr=8-1">OpenGL ES 2.0 Programming</a> Guide which covers OpenGL ES 2.0 in depth. The downside for both books is that none of them cover Android OpenGL coding and both books have code examples in C. So, I decided to break my learning curve into a few parts: <ol> <li>Review my C/C++ knowledge so I can have a better understanding of the books.</li> <li>Understand OpenGL programming in general (Code and ES) and use this knowledge on Java Programming</li> <li>Use the OpenGL + Java programming skills on Android.</li> </ol> The starting place for my research was <a href="http://www.opengl.org/resources/bindings/">this page</a> on <a href="http://opengl.org/">opengl.org</a>, When checking out those links, I found out that many of them had not been updated for OpenGL 3., or even OpenGL 2.. Java 3D is more than an OpenGL Binding and is more like a <a href="http://en.wikipedia.org/wiki/Scene_graph">SceneGraph API</a> filled with lots of bells and whistles. The real contenders were <a href="http://jogamp.org/jogl/www/">JOGL</a> and <a href="http://lwjgl.org/">LWJGL</a>. So, I researched both APIs in more depth and found that LWJGL is leaner than JOGL and gives me access to all the methods I need to implement the code from the OpenGL SuperBible book and would make an easier port to the Android OpenGL API too, which is nothing more than bindings to the OpenGL ES 2 API. Of course, nothing comes for free. I have had to (and still am) translate a lot of math code from the C++ book to Java. My intention is to get some tutorials done and help people who have the same goals! Stay tuned for updates soon!

Learn OpenGL for Android game development using LWJGL. This guide covers transitioning from C/C++ OpenGL knowledge to Java, focusing on LWJGL as a leaner, efficient binding for OpenGL ES 2.0 on Android, perfect for porting examples from OpenGL SuperBible. Start building high-performance 2D and 3D games today!