🖥️ Running Local LLMs: Experiments and Insights

✨ Summary

Large Language Models (LLMs) have powered the AI wave of the last 3–4 years. While most are closed-source, a vibrant ecosystem of open-weight and open-source models has emerged.

As a long-time AI user, I wanted to peek under the hood: how do GenAI models work, and what happens when you actually run them locally on your laptop?

In this blog, I’ll cover:

  • How GenAI models are built ⚙️
  • Why local inference matters 🚀
  • My experiments with Qwen, Llama, and GPT-OSS on my Mac 💻

🔄 Hybrid Model Inference

Computing has gone through cycles: centralized → decentralized → hybrid. I believe AI inference is following the same path:

  • Early computing → Mainframes (centralized)
  • PCs/laptops → Decentralized
  • Today → Cloud + Edge (hybrid)

👉 Most model inference currently happens in the cloud (huge infra needed).
👉 But smaller, specialized models now run on edge devices (laptops, even mobiles).

⚠️ Training won’t realistically move to the edge — it’s too compute-heavy and usually a one-time process.
Inference is moving local — it’s repeated, latency-sensitive, and can benefit from privacy/cost savings.


💡 Use Cases of Running Models Locally

  • Reduce latency: Voice assistants, live translation, autonomous vehicles
  • 💰 Reduce cost: Developer workflows, consumer electronics
  • 🌍 Offline use: Remote fieldwork, disaster response
  • 🔒 Privacy: Healthcare, enterprise security
  • 🛠️ Customization: LoRA adapters, RAG integration

🏗️ How GenAI Models Are Created

LLMs typically follow the Transformer architecture and are built in two stages:

  1. Pre-training: Learn general language patterns from massive datasets
  2. Post-training (fine-tuning): Teach task-specific skills (chat, reasoning, coding, etc.)

Result → A model ready for inference.


🧩 What an AI Model Contains

  • Weights: Learned numerical parameters (quantized models = smaller + faster)
  • Tokenizer & Vocabulary: Convert text ↔ tokens
  • Config: Architecture, layer counts, hidden sizes, etc.

🗂️ Common formats: Hugging Face / Transformers, GGUF, ONNX, Apple MLX.


🔁 How Generation Works (Simplified)

  1. Tokenization → Text → tokens
  2. Forward pass → Model processes tokens → probability distribution
  3. Decoding → Pick next token (greedy, sampling, top-k/top-p, etc.)
  4. Loop → Append token → repeat until done
  5. Detokenize → Tokens → final response

📊 Comparing Models

Common Evaluation Axes

  • Technical specs: Parameters, memory, speed, context length
  • Quantitative benchmarks: MMLU (knowledge), ARC (science), HumanEval (coding)
  • Qualitative: Creativity, domain knowledge, licensing

🔍 Open-Weights Model Comparison

I installed these 3 models in my mac, more details on it further down…

FeatureQwen2.5:7B-InstructLlama3:latestGPT-OSS:20B
Model Size7B8B20B
File Size4.7 GB4.7 GB13 GB
Key AdvantageMultilingual (29+), strong structured outputReasoning + code gen optimizedLarge, strong reasoning
Hardware Need8GB+ GPU8GB+ GPU16GB+ GPU
Typical UseMultilingual chat, summarizationGeneral-purpose, coding, creative writingAdvanced reasoning, tool use
LicenseApache 2.0Meta custom (check site)Apache 2.0

🔓 Open Weights vs Open Source models

Often confused! Here’s the difference 👇

ActionOpen SourceOpen Weights
Run inference
Fine-tune (adapters)
Full retraining
Audit code/data
Commercial useUsually allowedOften restricted
RedistributionUsuallyRestricted
Modify & republish

👉 Takeaway: Open weights let you use and adapt, but open source lets you rebuild.


💻 Using Open Weight Models Locally

On my MacBook Pro (32 GB RAM) I installed models using Ollama:

  • Qwen2.5:7B-Instruct
  • Llama3:latest
  • GPT-OSS:20B
ollama list
NAME                   ID              SIZE      MODIFIED    
qwen2.5:7b-instruct    845dbda0ea48    4.7 GB    3 weeks ago    
llama3:latest          365c0bd3c000    4.7 GB    3 weeks ago    
gpt-oss:20b            aa4295ac10c3    13 GB     3 weeks ago   

Install Ollama:

brew install ollama

Download a model:

ollama pull gpt-oss:20b

Run it:

ollama run llama3

…and you can start chatting!


🧪 My Experiments

⚖️ Use Case 1: Local LM Arena

Inspired by lmarena, I built a local version:

  • User query → Sent to multiple models
  • A “judge” model scores responses
  • Models get ranked

Following is a screenshot of the application:

The 2 models compared here are qwen and llama and gpt-oss is grading the response.

💡 Example: Qwen scored 9/10, Llama scored 7/10, as judged by GPT-OSS.


🎛️ Use Case 2: Tuning Model Parameters

I tested how model parameters affect their responses:

ParameterRoleBest Use
TemperatureControls randomness0.1–0.3 → factual, 0.7+ → creative
Top-PRestrict to top probability massLower → focused, Higher → diverse
Top-KConsider top K tokensLow (10–40) → predictable, High (100+) → diverse
Repeat PenaltyDiscourage repetition1.05–1.1 → natural
Stop SequencesCut off responsePrevent drift/hallucination
SeedFix randomnessDebugging / reproducibility

👉 Lowering temperature/top-p/top-k + good prompts = fewer hallucinations.

I created an application where we can specify these model input parameters and check how the responses vary. I used another model to evaluate if the responses provided are inline with the model parameters.

I was able to experiment and get the parameter combinations for providing consistent response or for reducing hallucinations. 

Following is a screenshot of the application:


Following is the response evaluation output:


🛠️ Use Case 3: Modifying Base Models

Tried LoRA adapters → freeze base model + insert tiny trainable matrices.
⚠️ Didn’t fully succeed due to library issues, but worth exploring for cheap fine-tuning.


📖 Glossary (Quick Reference)

  • Parameters: Learned weights/biases
  • Tokens: Atomic input/output units
  • Context length: Max tokens a model can process at once
  • Embedding: Numeric vector for tokens/context
  • Transformer: Model architecture with self-attention
  • Pre-training: Large-scale language learning
  • Fine-tuning: Specialization for tasks
  • Quantization: Lower precision → smaller, faster models

🚀 Closing Thoughts

Local LLMs are moving from curiosity to practical tools. With tools like Ollama and LM Studio, you can:

  • Experiment with models directly on your laptop 💻
  • Balance privacy, latency, and cost 🌍
  • Customize outputs for your own use cases 🛠️

And with ongoing advances in quantization and small yet powerful models, local inference is only going to get better.

AI Browsers Are Here — My Experience with Perplexity’s Comet

I have been using Perplexity’s Comet browser for the past two weeks, and it has completely changed the way I use browsers 🌐. I’ve been a Chrome user for as long as I can remember, but after trying out Comet for two weeks, I finally made it my default browser ✅.

Comet functions not just as a browser, but also as an AI assistant/agent 🤖 that automates many browser-based tasks. In this blog, I’ll share what AI browsers are, my experiences with Comet, and the use cases where I found it most useful.


❓ What is an AI Browser?

An AI browser integrates an AI agent directly into the browsing experience. This agent is aware of the activity in your tabs 🗂️ and provides recommendations and automations ⚡.

In addition to Comet from Perplexity, there are other AI browsers like Dia, Brave, and Opera. While I haven’t tried them personally, my research suggests that Comet offers much deeper AI integration 🔗.

Compared to ChatGPT’s agent, Comet runs locally on your machine 💻 and can directly control the browser. This makes it more secure 🔒 than agents like ChatGPT, where credentials are sent to external servers.


🌟 Why Did Perplexity Enter the Browser Space?

  • Most of us spend 60–70% of our workday inside browsers 🖥️.
  • Browsers are no longer just for websites; they’re the front door to SaaS apps and even AI IDEs like Replit.
  • By embedding AI into a browser, Perplexity ensures “stickiness” 📌 — you’ll keep coming back.

Building on top of Chromium (open source) was a smart move 🧠, making migration from Chrome relatively easy.


📥 Getting Comet

I joined the waitlist 📝 as soon as it opened. Currently, Comet is available to Max customers ($200/month 💸) and a limited set of Pro users. Luckily, through Airtel, I got access as a Pro subscriber 🎉.

Installed it on my MacBook 🍎, and ran it side-by-side with Chrome for two weeks.


🔄 Migration from Chrome to Comet

The migration experience was mixed:

  • ✅ Extensions came through (though not all worked perfectly).
  • ✅ Some Chrome settings migrated.
  • ❌ Bookmarks didn’t import properly.
  • ❌ Passwords, sessions, cookies, and profiles were not migrated 🔑.
  • ⚠️ Web3 wallets had to be re-imported manually.

💡 Use Cases

The more I used Comet, the more possibilities I discovered. The simplest one? “Summarize this page for me” 📝.

🛒 Shopping

  • Bigbasket
    • Query: Order toor dal (½kg), guava juice (6), almonds (200g), walnuts (200g), cilantro (100g), carrots (½kg).
    • ✅ Comet found them and added to cart. If multiple options exist, it picks randomly unless you specify (“pick cheapest” 💰).
  • Amazon: Show me all sports-related purchases I made last year 🏏
  • Comparison: Find cheapest price for Sony Bravia 55” TV across Amazon & Flipkart 📺

☁️ SaaS

  • GCP Console: Find logs with errors between 6 PM and 10 PM
  • Firebase: Check if anonymous authentication is enabled 🔐
  • YouTube: Show the videos with most views from my subscriptions in last 30 days ▶️
    • It auto-scrolls, gathers stats 📊, and summarizes.
  • Gmail: Find important unanswered emails ✉️
  • Google Calendar: Schedule a 30-min meeting with <X> tomorrow 📅
  • Google Sheets: Create a pivot table 📈 (took retries, but worked).

🌍 Social

  • X (Twitter): Show me which people I follow are from India 🇮🇳
  • LinkedIn: Make a chart of my posts vs. view counts 📊

🔗 Multi-Tool Workflows

  • Amazon.in → List vegan chocolates 🍫 that deliver in 1 day → Export to Google Sheets
  • Flipkart → Find laptops under ₹50,000 💻 with 16GB RAM → Compare specs → Export to Sheets
  • Swiggy → Find vegan restaurants near Indiranagar 🥗 → Filter for 30-min delivery → Export menu highlights to Sheets
  • The Times → Summarize top 3 EV policy articles this month ⚡🚗 → Export to Google Docs

✅ Pros vs ❌ Cons

Pros
✨ Easy migration (based on Chromium)
✨ AI “superpowers” while browsing
✨ No switching between browser ↔ AI agents
✨ Tab grouping & multi-agent parallelism
✨ Can search across multiple tabs 🔍

Cons
⚠️ Partial/inaccurate outputs (AI issue)
⚠️ Slow on complex websites 🐢
⚠️ Weak compared to Chrome in syncing & performance
⚠️ Doc editing in Google Docs is buggy
⚠️ Not available for mobiles 📱
⚠️ Security risks from prompt injection attacks 🛡️


🏁 Final Thoughts

I love the Comet browser for its AI-driven, agentic capabilities 🤖. After two weeks, I switched my default from Chrome to Comet.

I still keep Chrome as backup 🔙 for extensions and performance, but Comet shines in automation and research workflows 🌟.

Security remains a concern ⚠️ — malicious websites could hijack the AI agent — but as Comet integrates with more tools, its superpowers will only grow stronger 💪.

The Rise of CLI-Based AI Coding Agents: Claude code vs Gemini CLI

Introduction

I have been a Cursor user for vibe coding for 3 months. I was very skeptical about using Claude Code and Gemini CLI at first, since I wasn’t comfortable with the idea of using a terminal as an AI agent. But in the last 1–2 months, I’ve been trying them both — and it completely changed my opinion.

In this blog, I’ll share my experiences of using them, my favorite pick between the two, and a comparison of the three broad categories of AI-assisted coding approaches that exist today.


The Three Approaches to AI-Assisted Coding

I see broadly 3 kinds of AI-assisted coding approaches:

  • Chat interface with canvas → ChatGPT, Claude
  • IDE integrated AI tools → Cursor, Windsurf, Replit, Lovable
  • CLI-based AI agent tools → Claude Code, Gemini CLI, Warp
🧑‍💻 Category💬 Chat Interface🛠️ IDE Integrated Assist⚡ CLI-based AI Agent
Where it operatesBrowserStandalone IDE or browser (Cursor uses IDE, Lovable uses browser)Terminal or IDE
Use casePrototyping, small functions, quick answers, “throwaway weekend projects.”Augmenting the core coding loop: writing, refactoring, debugging.Automating workflows: multi-step tasks, system commands, project-wide changes.
Vibe coding stylePure “vibe coding” (conversational prompting).Hybrid of “vibe coding” + “developer assist.”Agentic + autonomous (give AI a goal and let it execute).

For a vibe coder like me, a CLI-based AI agent inside VS Code works perfectly — I get the best of both IDE and terminal with AI agent powers.


My Project: A 2-Way Translator App 🌍

To test these tools, I built a translation application.

When I visited Vietnam a few months back, I noticed cab drivers and restaurants using Google Translate effectively. But one problem stood out: only one device could be used for back-and-forth communication.

So, I decided to build a two-way translation application that solved this problem.

I drafted the following prompt (with ChatGPT’s help):

Global Translator App – MVP Requirements (Web Application)

  • Build a web app that lets two users communicate in real time via text translation.
  • Users connect via QR code or unique ID.
  • Support both text and speech.
  • Translate automatically for seamless conversation.

(Details moved to the appendix 👇)

Pre-requisites:

  • Google Translate API with GCP
  • Firebase backend

Claude Code vs Gemini CLI ⚡

🔎 Feature🤖 Claude Code🌐 Gemini CLI
📍 Location of useStandalone terminal or inside VS Code. In VS Code, Claude Code has IDE context — lets you select code and ask about it.Standalone terminal only. In VS Code, Gemini CLI has no IDE context (though Google offers Gemini Code Assist for IDE, without terminal capability).
💻 Terminal capabilityExcellent — can view files, execute commands, analyze outputs.Limited — shell commands can’t run in foreground, stateless (no persistent cd), no command completion.
⚙️ AI agent capabilityStrong coding performance; required multiple iterations but reliable.Decent, though not as strong as Claude Code.
🧪 Debugging & TestingSuperb. With terminal + MCP integration, I could run unit tests from both terminal and frontend.Limited debugging/testing due to terminal restrictions and weaker MCP tool support.
🔌 MCP integrationVery good. I integrated Playwright (UI automation) + Firebase.Okay. Playwright struggled (e.g., no 2-browser instance support). Firebase worked fine.
💸 Cost & model$20/month plan (Sonnet). Didn’t use Opus ($200/month). Sometimes hit daily quota limits.Free with generous limits (Gemini 2.5 Pro).

Verdict so far: Claude Code > Gemini CLI for most features, especially debugging and testing.
But Gemini CLI’s pricing (free) and generous usage limits are a big plus.

If Google can merge Gemini CLI with Code Assist and improve Playwright integration, it will become a fantastic package. On the other hand, Claude Code really needs a more flexible pricing tier between $20 and $200.


Project Output

  • Translation app built with Claude Code → [Demo link here]
  • Translation app built with Gemini CLI → [Demo link here]

Flow of the app:

  1. User logs in with a username (no auth to keep simple).
  2. Picks language + connects with another user via QR code or username.
  3. Supports both text + voice translation in real time.
  4. Built as a PWA → works on web + mobile.

Debugging & Testing with Claude Code 🔍

This is where Claude Code really shines:

  • Console errors are debugged + fixed automatically.
  • AI agent generates unit test cases, executes them, finds failures, and fixes them.
  • Even frontend integration testing works — thanks to MCP integration:
    • It inspects browser console logs.
    • Takes screenshots to analyze UI/UX issues (!).

I even asked Claude Code to:

  • Make a 90-second demo video of the app.
  • Simulate two users chatting with translations in the app. It worked beautifully.

Demo video created by Claude

Global Translation – User1

Global Translation – User2


Summary ✨

AI-assisted coding has matured tremendously in the last year and is now a top revenue driver among AI apps.

In my first blog on Vibe coding, I complained about limited debugging and testing with the AI coding tools. With these new coding agents, that problem feels largely solved.

Next, I’d love to see AI agents:

  • Do better system design.
  • Produce more modular code.
  • Integrate smoothly with existing codebases.

Between Claude Code and Gemini CLI → Claude Code wins hands down 🏆.
But I’m confident Gemini CLI will close the gap soon.


Appendix

Detailed prompt given for the translation application:

Tech Stack

  • Frontend Framework: React (or a similar modern JavaScript framework like Vue/Angular, but React aligns with future React Native plans)
  • Backend: Firebase (Firestore/Realtime Database for real-time chat, Authentication, Cloud Functions for server-side logic if needed)
  • Translation API: Google Cloud Translation API
  • QR Code: Open-source JavaScript libraries for QR code generation and scanning (e.g., qrcode.react, html5-qrcode)
  • Authentication: Anonymous sign-in (extendable to Gmail sign-in later)
  • Chat History: Local browser storage (e.g., LocalStorage, IndexedDB – no cloud sync for MVP)
  • Encryption: Not required for MVP
  • UI/UX: Simple, intuitive, and modern chat interface inspired by leading web messaging apps (e.g., WhatsApp Web, Telegram Web)
  • Dark Mode: Full support for dark mode from MVP

Core Features (MVP)

  • User Onboarding
    • Anonymous sign-in (no registration required for MVP)
    • Generate a unique user ID and QR code for each user upon entering the app
    • Users can choose and save a unique username, which is validated against a central Firestore database to prevent conflicts.
  • Connection Mechanism
    • QR Code Scanning: Allow users to scan another user’s QR code using their device’s webcam/camera (if available and permission granted).
    • Manual ID Entry: Provide an option to manually enter another user’s unique ID to initiate a chat.
    • Display your own QR code for others to scan.
    • The application remembers the last 5 friends you’ve connected with, allowing for quick selection from a dropdown menu.
  • Progressive Web App (PWA):
    • The application is designed to be installable on mobile and desktop

devices, offering an app-like experience with potential offline capabilities.

  • The layout is optimized to adapt and display correctly across various screen sizes, including iOS and Android mobile browsers.
  • Chat Interface
    • Real-time text chat between two users.
    • Each user selects their preferred language from a dropdown/selector. This language is the language to be used by the friend on the other side. 
    • Messages are automatically translated to the recipient’s language using Google Translate API.
    • Show both original and translated text in the chat bubble.
    • Support for dark mode.
    • Friend Online Status (Basic): It displays whether a friend is currently “Online” or “Offline” (with a “Last seen” timestamp). Note: The “offline” status is not automatically updated on browser close in the current setup.
  • Session Management
    • One-to-one chat sessions.
    • Simple chat history stored locally in the browser.
  • Language Support
    • Initial support for: Hindi, Telugu, Tamil, Kannada, English, and French.
  • Misc
    • A version number is displayed on the screen, making it easy to identify the deployed application version.

Non-Functional Requirements

  • Responsive and intuitive UI/UX, adapting well to different screen sizes (desktop, tablet, mobile browsers).
  • Fast translation and message delivery.
  • Minimal data usage.
  • Accessibility support.
  • Dark mode support.
  • Cross-browser compatibility (Chrome, Firefox, Safari, Edge).

Future Extensions (Post-MVP)

  • Native mobile applications (Android & iOS) using React Native.
  • Gmail sign-in and user profiles.
  • Speech-to-text and text-to-speech for voice communication.
  • Discover nearby users (if feasible for the web, e.g., using WebRTC data channels or location APIs).
  • Group chats.
  • Persistent chat history with cloud sync.
  • End-to-end encryption.
  • Support for additional languages.

🚀 A Guide for B.Tech CS Students to kickstart your AI journey

👋 Introduction

My daughter will be starting her B.Tech in Computer Science at MIT, Manipal this year. As a huge AI proponent, I often share the latest AI trends and tools with my family. When my daughter decided to pursue CS, she asked me several questions about AI, which inspired this blog. I hope this guide helps any student planning to specialize in CS and AI.


📚 Core Fundamentals for CSE Students

Before diving into AI, it’s crucial to master the basics. These are some of the building blocks for everything you’ll do in computer science. Following links will give you an overview of the basics before you deep-dive.


📝 General Advice for Students

In addition to doing your coursework, following tips can help you to be more practically prepared for the industry .

  • Start with Fundamentals: Focus on math, programming, data structures, and algorithms.
  • Build a Portfolio: Work on projects, participate in Kaggle competitions and hackathons, and maintain GitHub repositories.
  • Network: Join AI clubs, attend meetups, and connect with peers and professionals on LinkedIn.
  • Stay Updated: Follow AI news, research, and trends.
  • Internships: Real-world experience is invaluable—seek internships early.

🛠️ Tools to Try Out

Following is just a sample collection at this point of time. The tools change so fast so it’s very important to keep yourself updated with the latest.

  • Chatbots: ChatGPT, Gemini (Try ChatLLM, an aggregator of chatbots and other AI tools collection, its very handy)
  • Vibe Coding: Cursor, Windsurf, Replit, Pythagora (see my earlier blog for more)
  • Image Generation: DALL-E(OpenAI), Midjourney
  • Video Generation: Google Veo
  • ML Platforms: Google AI Studio(Good to experiment with Google AI models), Kaggle(Kaggle competitions are good, good for datasets and notebooks), Hugging Face(Marketplace for models, datasets and easy to share the ML work with others)
  • Automation: Zapier (AI orchestration platform connecting different AI and non-AI tools and platforms)

Note: “Vibe coding” refers to using AI-powered coding environments that help you code faster and more intuitively.


🤖 Exploring AI Domains & Career Paths

Here’s a quick overview of different AI roles, what they do, prerequisites, and how to get started. AI industry is still at its nascent stage, these roles can change as the technology matures.

RoleWhat They DoPrerequisitesHow to Get In
AI ResearcherDevelop new AI models/algorithms, advance the field, publish researchStrong math (linear algebra, stats), deep ML/DL, Python, PyTorch/TensorFlow, research skills, academic writingAdvanced courses (Master’s/PhD), join research labs, open-source, publish papers, attend conferences
ML EngineerBuild, optimize, and deploy ML models in production; manage ML systemsProgramming (Python, C++/Java), ML frameworks, software engineering, cloud (AWS/GCP/Azure), MLOps basicsEnd-to-end ML projects, internships, open-source, learn CI/CD, Docker/Kubernetes, model deployment
Data Engineer/ScientistBuild data pipelines, clean/process data, extract insights, visualize findingsPython, SQL, data wrangling, statistics, data viz, ML basics, big data tools (Spark, Hadoop)Data science/engineering courses, Kaggle, portfolio projects, internships, learn data tools and visualization
AI Application EngineerIntegrate AI models into real-world apps/products; focus on APIs and UXProgramming (Python, JS, etc.), API development, front/back-end, basic ML, UX/UIBuild AI apps, hackathons, internships, learn REST APIs, cloud deployment
AI Security & SafetyEnsure AI systems are secure/safe; address ethical, legal, and risk concernsSecurity fundamentals, cryptography, adversarial ML, AI ethics, risk, regulations, ML basicsCybersecurity/AI ethics courses, CTFs, follow AI safety research, join labs/organizations
AI Product ManagerDefine vision/strategy for AI products; bridge tech and business teamsAI/ML concepts, product management, communication, business acumen, user researchStart as engineer/analyst, PM courses, AI projects, internships, develop leadership/communication
AI Hardware SpecialistDesign/develop hardware/software (GPUs, TPUs, SDKs) for AI training/inferenceECE/CS, digital design, computer architecture, parallel computing, C/C++, CUDA, ML basicsECE/CS courses, hardware internships, FPGA/GPU projects, hardware-software co-design, follow NVIDIA/AMD/Intel

🧑‍💻 AI Basics for Students

Following is just a sample to get started with AI basics.


🤔 How Should College Students Use AI (and How Not To)?

  • Don’t: Use AI chatbots to solve class assignments directly—this can kill creativity and hinder learning.
  • Do: Use AI as a learning tool to explore new ideas, get feedback on completed assignments, and clarify concepts after self-study.
  • Tip: Treat AI as a personalized teacher—seek help only after you’ve tried solving problems yourself.

🔄 Staying Updated with AI

  • Curate Resources: Make a repository of your favorite podcasts, blogs, and YouTube channels.
  • Hands-On Practice: Try new AI tools and work on personal projects.
  • Mix Coding Styles: Combine “vibe coding” (AI-assisted) with traditional coding to strengthen your skills.

💡 Is AI Going to Take My Job?

A typical software engineer spends only 30–40% of their time coding; the rest involves architecture, design, spec reviews, cross-functional discussions, integration testing, and release processes. While AI can assist with coding, these other activities are equally critical and difficult to automate.

Even within coding, engineers must structure code, manage module interactions, choose technologies, debug, test, scale, and deploy—tasks that require human judgment. AI coding tools can boost productivity by 30–40% today, and possibly up to 70% in the next 1–2 years. However, over-reliance on these tools can erode core skills, and poorly organized AI-generated code can become hard to maintain.

There’s no substitute for strong design and coding fundamentals. Use AI tools as an assistant, not a replacement.

Jevons Paradox: If coding becomes much easier and cheaper, we’ll see more coding projects and more coders, not fewer. The demand for skilled engineers will grow as we automate more of the world.

For the next 5–10 years, CS engineers will remain essential. If AI ever surpasses humans in all aspects (AGI), it won’t just be engineers—every profession will be affected.


🌱 Final Thoughts

CS or CS with AI specialization are fields of endless possibility. Stay curious, keep building, and remember: the journey is as important as the destination. Embrace change, focus on fundamentals, and use AI as a tool to amplify your learning and creativity.


Wishing all new B.Tech CS students an exciting and rewarding journey ahead!


Picture with my lovely daughter!

Are Smart Glasses the Future of AI? My Hands-On Review of Meta AI Glasses

Honestly, I never believed smart glasses would become a mainstream AI form factor—until I bought the Meta Ray-Ban Smart Glasses two weeks ago! 😎 This gadget had been on my wishlist for a while, but it wasn’t available in India, and even if you managed to get one from abroad, the app didn’t work well here. Thankfully, Meta launched these glasses in India a month ago, and you can now buy them online or from certified optical dealers. In this blog, I’ll share my hands-on experience from the past two weeks.

Why Glasses? The Hands-Free Advantage 🙌

The first thing I realized: glasses are a fantastic form factor when you want to go hands-free and avoid constantly reaching for your phone or laptop. Google tried this a decade ago, but the tech just wasn’t ready. (More on Google’s new AI glasses later!)

I mostly use the glasses outdoors—while walking, running, or cycling. Indoors, I didn’t find much need for them.

Design & Comfort 🕶️

The design is sleek and modern, not clunky at all. They look like regular sunglasses, so you won’t stand out in a crowd (unless you want to!). However, after a few hours, they do feel a bit heavy, and I sometimes want to take them off for a break.

Use Cases: Where Smart Glasses Shine ✨

Photos & Videos 📸🎥
The 12MP ultra-wide camera delivers good quality photos and up to 3-minute videos. While it’s not quite smartphone-level, the hands-free capture is a game-changer—especially for impromptu moments or when you’re on the move. There’s even blur compensation to keep your shots clear. Selfies are a bit tricky, but you can always take them by holding the glasses like a phone.

Music, Podcasts & Calls 🎶📞
With 5 microphones and 2 speakers, the audio quality is impressive. The directed audio keeps you aware of your surroundings—crucial for outdoor activities. Personally, listening to music made my uphill cycling sessions much more enjoyable! 🚴‍♂️

The AI Edge: Meta AI in Your Glasses 🤖

The real magic is in the AI. Meta AI uses the latest Llama models, giving you robust speech-to-text and general chatbot capabilities. While Llama isn’t quite at OpenAI’s level, it works well for most queries. The best part? Multimodal capability! You can ask questions about what you’re seeing. For example, I spotted a tree with unique flowers, asked the glasses to identify it, and got an accurate answer. This feature will be super useful when traveling or reading foreign text.

Live Speech Translation 🌍🗣️

Currently, live translation supports French, Spanish, and Italian. It works best if both people have Meta glasses (for two-way translation), but even one-way translation is handy. I tested it with my daughter’s French and while watching a French video—worked well as long as the audio wasn’t too fast.

Cons & Limitations ⚠️

  • The glasses are a bit heavy and feel bulky after extended use.
  • Occasionally, they freeze and need a restart.
  • Battery life is about 3–4 hours—okay for most outings, but longer would be better.

Pro Tips for Buyers 📝

  • If you need prescription lenses, get the AI glasses fitted accordingly (external vendors can help).
  • If you don’t need a prescription, consider transition lenses for both indoor and outdoor use. I use reading glasses, so transition lenses are perfect for me.

Some pictures and videos that I took 📸🎥

Cycling clip

357CE2C1-65CC-41B4-B16C-398939BADBDB
25007A9D-B670-43E6-AA6A-ED4AF2204D09
3419F8F8-2010-4751-AD29-FC3C0A6A8AAE
55AE6054-A2C7-4A54-8928-B127E1113761

Final Thoughts & Google Glasses Comparison 🥽

After seeing Google’s latest demo at I/O, I’m excited for their upcoming glasses, especially with XR and virtual screen features. That could be a game-changer, but it’s likely a year away and pricing is still unknown.

For now, I absolutely love my Meta AI glasses. Priced between ₹29,000–₹35,000, they’re a solid investment for the features you get. I’m convinced glasses will be a major new form factor for AI—though not the only one.


Would I recommend them? Absolutely, if you love trying new tech and want a taste of the future—hands-free! 🚀

🤖 AI Customer Support using an Agentic Framework

In this blog, I’ll walk you through the design, development, and lessons learned while building a multi-agent AI customer support assistant using the LangChain framework and related AI tools. 🎮💬


🎯 Motivation: Why Build This?

At KGeN, a game aggregation platform connecting publishers and gamers, our primary users are gamers and clan chiefs (micro-community leaders).

These users often ask questions about:

  • Platform features
  • Game-specific achievements
  • Player and clan statistics

Some answers come from a static knowledge base, while others depend on dynamic user-specific data.

⚡ We wanted an intelligent, scalable AI assistant that could:

  • Understand natural language queries
  • Route them to the appropriate data sources
  • Continuously improve through feedback

Based on the poc feedback, I wanted to take this to production.

🧠 The use case generalizes to any industry with static documentation and dynamic user data—only the context changes.


🔗 Application & Code

The poc application is deployed in Render, you can try this out. The Github also contains instructions to run it locally or in cloud.


📌 Business Goals

I wanted a system with following business goals:

  • 🗣 Answer queries conversationally
  • ⚙️ Route questions to the right agent (static/dynamic/hybrid)
  • 🧾 Escalate unresolved issues via Jira tickets
  • ⭐ Collect feedback for iterative improvements
  • 📚 Learn from feedback to enhance performance

🧪 Prototyping Approach

I followed a “vibe coding” model:

  • Start fast with a working prototype
  • Use AI to assist (ChatGPT + Cursor editor)
  • Iterate with real feedback

💡 Tools Used:

  • ChatGPT to generate mock data (static + SQL)
  • Langchain/Langsmith as agentic framework
  • Cursor for AI-assisted coding
  • Render for cloud deployment

⚠️ Tip: Feed detailed requirements to AI code editors. Without clarity, they produce unreliable or messy code.


🧭 Agent Flow: How It Works

Each user query is first routed by a Main AI Agent, which classifies the query as:

  • 📘 Static: Uses vector search on documentation (FAISS)
  • 🗄 Dynamic: Converts to SQL query on structured data
  • 🔁 Hybrid: Mixes both static + dynamic sources
  • 📥 Follow-Up: Needs more user input
  • 🚨 Escalation: Routed to a human via Jira

Each type has a specialized agent with its own system prompt.

🧠 LangChain powers the routing, agents, and execution logic.

📌 Architecture Diagram: Agent Flow


🧱 Architecture & Tech Stack

ComponentTool / FrameworkReasoning
Agent FrameworkLangChainModular, battle-tested
MonitoringLangSmithEasy trace/debug for agents
Vector DBFAISSSimple to set up for POC
LLMsOpenAI (pluggable)Can switch to others like Claude
Backend APIFastAPILightweight, async-friendly
Frontend (POC)StreamlitQuick prototyping
DeploymentRenderEasy cloud deployment
TicketingJira APIFor support escalations
DB (Local/Test)SQLiteLightweight
DB (Production)PostgresScalable

🐞 Issues & Learnings

This whole application took me around 8-10 hours over a period of 2 weeks. I got time to spend only on weekends to do this.. Following are some issues I faced:

  • 🧩 Dependency Hell: LangChain and LLM libs change fast. Cursor couldn’t resolve pip issues well. I had to request cursor to get latest details in internet to resolve it.
  • 🧪 Streamlit Cloud Problems: Ended up moving to Render for better compatibility.
  • 🌍 Env File Confusion: Environment-specific bugs were hard to debug in prod as Cursor does not integrate with Render deployment.

🔍 Debugging with LangSmith

Langsmith is great to understand if the agentic workflow is working as expected. I was able to fix the following issues with Langsmith:

  • 🔎 Identified issue that search from vector database is giving the whole static knowledge base instead of giving the specific context. Adding semantic analysis to vector database match helped solve this.
  • 🧩 Fix hybrid agent’s output merging logic
  • 🔁 Debug why hybrid/support queries didn’t escalate to Jira

📂 Sample Queries along with Langsmith trace

Static query example: What are legendary items?

From the above trace, we can see that there are 2 LLM chain calls and 1 call to vector database. The first chain call is to identity the type of query and the second chain call is to summarize the response from vector database.

Hybrid query example: How many gold achievements has DragonSlayer99 earned and what rewards do they give?


From the above trace, we can see that there are 6 chain calls in the above query:

  • first to identity type of query
  • second to summarize results of vector database
  • third to check if there is username in the query
  • fourth to generate sql query and get results from postgres db
  • fifth to take the results from sql query and generate summarized response
  • sixth to combine the summarized static data and dynamic data to give response to the user

📋 Requirements Summary (Generated via ChatGPT)

I fed these requirements as initial prompt into cursor after few iterations of discussions with chatgpt.

💳 Business Requirements

  • Build an AI system that can:
    • 📘 Answer static queries from documentation
    • 🗄 Query live backend data
    • 🔁 Combine static + dynamic sources
    • 📥 Handle follow-up interactions
    • 🚨 Escalate to Jira when needed
  • Serve two user roles:
    • 👤 Gamers (general players)
    • 👑 Clan Chiefs (advanced users)
  • Goals:
    • Reduce manual tickets by 70%
    • Improve first-response time
    • Maintain conversational accuracy

📈 Functional Requirements

  1. Static Question Answering
    • Vectorize and index knowledge base with FAISS
    • Use RAG (Retrieval-Augmented Generation) to answer
  2. Dynamic Question Answering
    • Use LangChain SQL Agent to convert natural language to SQL
    • Query SQLite (for testing) and Postgres (in prod)
  3. Hybrid Handling
    • Mix RAG results with SQL data for composite answers
  4. Follow-Up Logic
    • Prompt for missing data (e.g., usernames)
  5. Escalation
    • Auto-create Jira ticket with conversation context if unresolved
  6. Multi-Agent System
    • Router → specialized agents (static, dynamic, hybrid, etc.)
  7. API & UI
    • FastAPI for backend
    • Streamlit for POC frontend; React for future UI

🚀 Technical Requirements

  • LangChain (Python)
  • FAISS or ChromaDB for vector storage
  • OpenAI or Claude LLMs
  • SQLite (via LangChain SQL agent) as simulated backend
  • JIRA API for ticket creation
  • FastAPI (backend API layer)
  • Streamlit (prototyping UI)
  • React (future UI)

🔐 Security & Testing

  • Role-based access (gamers vs. clan chiefs)
  • Environment variable protection
  • Unit tests, evaluation prompts, and simulated load

✅ Success Criteria

  • 90%+ accurate responses in test cases ✅
  • Sub-3-second latency ✅
  • Smooth Jira escalation pipeline ✅
  • API ready for frontend integrations ✅

🚀 Final Thoughts

This project demonstrates how AI agents, vector databases, LLMs, and good system design can solve real-world support problems.

There are many improvements needed to take this into production. Following are some of them:

  • Use Langchain conversation memory. This is maintained at streamlit level to stitch a conversation.
  • RBAC based on user login and queries based on the user.
  • Performance improvement using caching at different levels, database connection optimisation.
  • Agent self learnings from user feedback
  • Improve UI/UX

🔄 This customer support agent is a template that can be adapted across industries—from gaming to banking to e-commerce.

🔍 Debugging Web Apps with Cursor Just Got Smarter: Evaluating Browser Assist Tools

In my previous post, I shared my experience using Vibe coding and highlighted one of the biggest challenges in that workflow: AI coding tools often lack awareness of what’s happening in the browser when you run your app.

This leads to a frustrating dev loop: you’re forced to constantly copy-paste screenshots, console errors, and network logs into your code editor just to help the AI debug your application.

Luckily, there’s a new wave of tools built on the Model Context Protocol (MCP) that bridge this gap. These browser assist tools let your AI-enhanced code editor (like Cursor) directly observe, interact with, and sometimes even control your browser — just like a real user.

Some of these tools go beyond debugging — they can actually drive the browser, making them incredibly useful for UI testing and automation as well.


🧪 Tools I Evaluated

  1. Playwright
  2. Browser MCP
  3. Browser Tools MCP

Each of these plugs into Cursor via MCP and serves a slightly different purpose.


🧠 Architecture Overview

Cursor → MCP → Browser Assist Tool → Browser → Observed by LLM → Cursor responds
  • Cursor uses Model Context Protocol (MCP) to communicate with these tools.
  • The tools interact with the browser — either controlling it or reading logs/events.
  • The data is passed to the LLM, which interprets it and responds inside Cursor.

🧩 Tool Breakdown

1. Playwright + MCP

Developed by Microsoft, Playwright is a full-featured browser automation framework that supports Chromium, Firefox, and WebKit. It works across OS platforms and supports headless execution — making it perfect for automation and CI testing.

When integrated with Cursor via MCP, it becomes a powerful browser control agent.

✅ Installation

"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}

🔧 Supported Functions

These functions are exposed by playwright using MCP. playwright has more functionalities than the ones that it exposes to MCP.

  • browser_click, browser_type, browser_navigate
  • browser_take_screenshot, browser_snapshot, browser_pdf_save
  • browser_tab_list, browser_tab_select, browser_tab_close
  • …and many more

💡 Real Use Cases

  • Asked Cursor to debug console errors in my e-commerce app
  • Asked Cursor to test flows like “Add to Cart”, “View Product Details”
  • Used it for:
    • Clicking through workflows
    • Filling out forms
    • Scraping content
    • Capturing screenshots for visual debugging

⚠️ Limitations

  • Doesn’t read network logs or API errors
  • Click interactions does not work reliably with iframes

2. Browser MCP

This is a lightweight adaptation of Playwright, still MCP-compatible, but simpler.

✅ Installation

  • Install Chrome extension (manual or via GitHub release)
  • Cursor config:
"browsermcp": {
"command": "npx",
"args": ["@browsermcp/mcp@latest"]
}

💡 Why It’s Useful

Unlike Playwright, Browser MCP can control your already open browser tab, without launching a new browser instance. This is helpful for debugging apps you’re already running in Chrome.

🔻 Downsides

  • Fewer features than Playwright
  • Better suited for lightweight debugging, not complex automation

3. Browser Tools MCP

This tool focuses entirely on browser introspection and debugging, rather than control.

Think of it as DevTools for your AI.

✅ Installation

  • Install Chrome Extension
  • Start middleware:
npx @agentdeskai/browser-tools-server@latest
  • Cursor config:
"browser-tools": {
"command": "npx",
"args": ["@agentdeskai/[email protected]"]
}

🛠️ Supported Functions

These are exposed by browsertools using MCP.

  • getConsoleLogs, getConsoleErrors, getNetworkLogs, takeScreenshot
  • runAccessibilityAudit, runPerformanceAudit, runSEOAudit, runNextJSAudit
  • wipeLogs, runDebuggerMode, runBestPracticesAudit

💡 What I Could Do

  • Refresh app and automatically check network + console errors
  • Ask Cursor to analyze latency issues or API failures
  • Run full Lighthouse-style audits on performance and SEO

📊 Comparison: Which One to Use?

Use CaseBest Tool
Browser automation + basic debugging🟢 Playwright
Full DevTools-style debugging🟢 Browser Tools MCP
Debugging current browser tab with minimal setup🟡 Browser MCP

🧠 Final Thoughts

Playwright is phenomenal — not just for browser debugging, but for automation and testing at scale. If it added rich debugging support (like network logs and audits), it could become the one tool to rule them all.

Meanwhile, Browser Tools MCP fills that debugging gap beautifully today, while Browser MCP hits a sweet spot between the two.


🔮 Looking Ahead

I believe browser assist tools will eventually be natively integrated into code assist platforms like Cursor, eliminating the need for users to manually install and configure MCP plugins. In the future, these platforms will likely support a range of built-in agents that work seamlessly across different environments — web, mobile, desktop — and integrate with tools like databases, APIs, and SaaS platforms out of the box.

There’s also a new class of tools like Anthropic’s Computer Use and OpenAI’s Operator, which aim to control not just browsers but the entire computer environment. It feels inevitable that these worlds — browser automation, LLM-powered agents, and full computer control — will start to converge.

Exciting times ahead. ⚡

🚀 One Month with Vibe Coding: Building Real Apps with AI Assistants

Over the past few months, Vibe coding has been gaining serious traction—and I couldn’t resist diving in myself. I’ve been using AI coding assistants for a while, but I wanted to go deeper and really test what these tools can do in a realistic, end-to-end software development project.

So, I spent the last month building a full-featured ecommerce web and mobile app using some of the most talked-about Vibe coding platforms: Cursor, Windsurf, Lovable, Bolt, and Replit. It was a fun and empowering journey—there’s a real sense of accomplishment in being able to build software applications on your own. I also learned that working with the current generation of tools definitely requires a good deal of patience.

In this blog, I’ll walk you through:

  • My experience building and deploying the applications
  • What worked, what didn’t, and what broke halfway 😅
  • How each tool stacks up in terms of usability, flexibility, and reliability
  • Whether tools like these mean we still need software engineers (spoiler: yes—but it’s complicated)
  • Where I think this whole Vibe coding trend is heading next

🌍 Coding Assistant Landscape: Then vs Now

AI coding assistants have come a long way. Here’s a quick look at how things evolved:

⏰ The Old School

  • Classic autocomplete tools like IntelliSense or TabNine helped speed up typing but weren’t context-aware.
  • Low-code/no-code platforms (e.g., Bubble, Wix, Zapier) let users drag and drop components, but required scripting for anything complex.

🧠 The New Era: Vibe Coding

  • Powered by LLMs (Large Language Models)
  • Can write, refactor, debug, and deploy apps using natural language queries
  • Opens the door for non-developers to build apps
  • Empowers developers to skip boilerplate and focus on design, logic, and systems thinking

💡 What is Vibe Coding?

Vibe coding refers to using AI-powered tools to build software via natural language prompts, mixed with lightweight manual coding. It’s all about staying in the flow and letting the assistant do the heavy lifting.

💡 The Experiment

Although I started my career as a developer, I haven’t been actively coding in the last decade. Instead, I’ve focused on architecture, reviews, testing, and product design. That said, I wanted to push these Vibe tools beyond simple demos or prototypes.

So, I picked a moderately complex use case: an Ecommerce application with a web frontend and mobile app, complete with backend, auth, payment, and roles.

✨ Features Implemented

- User authentication (sign-up, login, password reset, Google login)
- Roles: Admin, Seller, Customer
- Admin: manage users, view orders, seller capabilities
- Seller: add products
- Customer: browse catalog, filter/sort, add to cart, checkout
- Order history
- Payment integration with Razorpay

🚀 Tech Stack Used

Frontend: React
Backend: Node.js + Express
Database: MongoDB
Deployment: Vercel / Render / Netlify depending on tool

🏗️ Environments

- Web app
- Mobile app (via Expo)
- Both local and production deployments

🔧 Tool-by-Tool Breakdown

Each tool was tested with the same requirements and judged based on ease of use, flexibility, ability to debug, and ability to deploy real features.

🧪 Cursor

🛠️ Plan: Paid ($20)

💻 Used With: MongoDB Atlas, Render/Vercel for deployment, Claude 3.7 model

Highlights:

  • Full tech stack flexibility
  • Supports both web and mobile
  • Git & database migration support
  • Wrote unit tests and debugged APIs
  • Workflow suits developers

⚠️ Challenges:

  • Terminal tracking is weak
  • Frequent application crashes
  • Manual debugging needed

📦 Artifacts:

Windsurf

🛠️ Plan: Free and Paid version

💻 Used With: Claude 3.7 & Gemini, Vercel/Render for cloud, Cloudinary for images

Highlights:

  • Better terminal/session management
  • Console log debugging is stronger

⚠️ Challenges:

  • Hard to course-correct from incorrect assumptions
  • Hit credit limits fast (Ran out of credits with paid version in 3 days)

📦 Artifacts:


⚡ Bolt

🛠️ Plan: Free

💻 Used With: React + Vite, Supabase, Netlify

Highlights:

  • Blazing fast startup because it runs as web container
  • Fully in-browser

⚠️ Challenges:

  • Can’t run backend services (e.g., Express, MongoDB) because of running as web container
  • Not suitable for full-stack use cases

📦 Artifacts:

  • Incomplete app prototype (Ran out of free credits)

😍 Lovable

🛠️ Plan: Free and then Paid ($20)

💻 Used With: React + Supabase, auto-deploy on Lovable Cloud

Highlights:

  • Very easy to use
  • Seamless production deployment

⚠️ Challenges:

  • Slower code generation speed

📦 Artifacts:

🛠️ Replit

🛠️ Plan: Free

💻 Used With: Ghostwriter AI, browser IDE, MongoDB Atlas

Highlights:

  • Easy to set up
  • Great for fast testing

⚠️ Challenges:

  • Cloud-only with less system-level flexibility
  • Not ideal for large production apps

📦 Artifacts:

  • Did not complete(ran out of free credits)

📊 Tool Comparison Snapshot

FeatureCursorWindsurfReplitLovableBolt
Ease of UseMediumMediumEasyEasyEasy
Dev EnvironmentLocalLocalCloudCloudCloud
Deployment OptionsManualManualBuilt-inBuilt-inManual
Tech Stack FlexibilityHighHighMediumLimitedLimited
Target UsersDevsDevsAllNon-devsNon-devs

🧠 My Take: Cursor gives you the most power; Lovable gives you the most convenience.

❌ What Needs Work

🛠️ Debugging:

Most tools still rely on you reading console logs and piecing things together manually. (My pick: Use Operator framework to understand what’s happening in browser and fix issues automatically)

🐌 Speed:

Long wait times and retries can break the flow.

🧩 Fragility:

Small changes can break other parts of the app. There’s no real “awareness” of architectural dependencies.

📐 Lack of modularity:

Encouraging reusable design and clean code still needs a human architect.

📘 Pro Tips: Making Vibe Coding Work

📋 Define clear requirements

Roles, pages, workflows, error states — lay it all out before prompting.

🧭 Use guardrails (rules/constraints)

Many tools let you enforce language, style, and folder structure.

🎯 Stick to common stacks

React, Node, Python, SQL — that's where LLMs shine.

💡 Use models wisely

Claude 3.7 was the most consistent for me, especially on multi-step flows. Experiment with models and find the best one for your use case.

🧪 Debug like a dev

Logs > terminal > DB traces. Be ready to dive in.

🔄 When stuck, reboot

Sometimes starting fresh saves more time than untangling broken AI logic. Keep regular checkpoints to go back to stable point. 

🧠 Is Software Engineering Dead?

Nope. But it’s definitely shifting.

🧠 What Vibe Coding Does Well:

  • Speeds up boilerplate
  • Empowers solo builders
  • Makes prototyping fast

🚧 What It Still Needs Help With:

  • Scaling apps
  • Clean architectures
  • Advanced debugging
  • Enhancing existing production apps

🧑‍💻 Developers won’t disappear. They’ll evolve. The future engineer:

  • Uses AI to generate & validate code fast
  • Designs smart systems
  • Oversees quality, reusability, and security

💬 “It’s not about coding less. It’s about coding smarter.”


Crypto AI agents

AI agents have emerged as one of the key AI themes in 2024, revolutionizing how we interact with AI as a technology. What caught me by surprise was the rapid rise of crypto AI agents and the unprecedented pace of innovation in this space. These agents are proving to be a boon for the web3 ecosystem, creating an entirely new category of web3 applications. In this blog, I’ll address key questions I encountered while diving into this space.

What are AI agents?

AI agents can be defined by three key characteristics:

  1. Autonomy: AI agents can make independent decisions. Once a goal is set, they determine the best path to achieve it.
  2. External Interactions: These agents can integrate with and operate external tools, such as productivity software (e.g., Word, Excel), payment systems (e.g., wallets), and business tools (e.g., ERP/CRM systems).
  3. Learning and Memory: AI agents continually learn from their experiences and interactions. With short- and long-term memory, they improve their performance over time.

What are different levels in AI agents and where are we now? 

AI agents are typically categorized into five levels, ranging from rule-based systems (Level 0/1) to autonomous learning systems (Level 3). At Level 5, agents are expected to achieve AGI (Artificial General Intelligence). Currently, we are at Level 3, witnessing advanced autonomy but far from AGI. Good reference here

What’s the difference between regular AI agents and crypto AI agents?

Regular AI Agents
Developed using frameworks such as Langchain, Langgraph, Rasa, or CrewAI, these agents automate complex workflows and are typically owned by centralized entities. Common use cases include:

  • Customer support agents
  • Healthcare assistants
  • Creative tools

Crypto AI Agents
In addition to the traits of traditional AI agents, crypto AI agents introduce tokenization and decentralized ownership. These agents are traded in crypto exchanges, have their own wallets which enables them to perform blockchain-based commerce autonomously. Use cases include:

  • Service payments for both agents and humans
  • Blockchain transactions
  • Decentralized finance (DeFi) investments

What blockchain standard do crypto AI agents use? 

Crypto AI agents leverage the ERC-6551 standard, which allows them to be represented as NFTs. This enables agents to have unique identities, wallets, and the ability to interact autonomously on the blockchain.

What are the popular blockchains that crypto AI agents operate on?

Solana and Base are leading platforms for crypto AI agents, driven by their high transaction throughput and developer-friendly ecosystems. Cross-chain operability is becoming a key trend, enabling agents to interact seamlessly across different chains. Out of the 2 biggest AI agent framework projects, Virtuals uses Base, AI 16z uses Solana.  

Why are AI agents good for crypto? 

AI agents are simplifying blockchain’s user experience (UX), removing barriers for non-crypto users. They autonomously manage web3 interactions, reducing complexities for tasks such as cross-chain transactions.
For instance, an AI agent can monitor token prices, execute trades, and bridge tokens across chains without user intervention.

Additionally, these agents are driving significant growth in blockchain transaction volumes, especially on chains like Solana and Base.

What are some of the popular crypto AI agents?

  • Truthterminal – First viral crypto ai agent,  Meme focused, promoted “GoatSE Singularity” culture. 
  • Aixbt – Autonomous crypto trading agent. Looks at all crypto market trends in twitter, monitors on-chain activities and provides recommendations in Twitter/X. Built on Virtuals
  • vaderAI – Investment agent. Aggregates on-chain and off-chain activities for investment advise.  Runs decentralized advertising campaigns based on token contributions.
  • Luna – Engages with users on platforms like Twitter/X and Discord to provide responses, interact, and entertain. This AI agent’s goal is to gain the maximum number of followers. 
  • Zerebro – Creative AI agent. Produces music, art, and NFTs autonomously. Zerobro produced songs are listed in spotify and they have a big fan following. 
  • God and Satan – Agents in Twitter/X that respond with a slice of humor 

This link from Virtuals has the top AI crypto agents built on Virtuals.

This is a good website that has details of all ai crypto agents. 

What is the role of crypto AI agent frameworks? 

Crypto AI agent frameworks allow developers to create crypto AI agents easily. 

Following are some important functionalities that the AI agent framework provides:

  • Best AI model based on the use case
  • Autonomous
  • Provide short and long term memory for saving context
  • Tokenization support 
  • Blockchain support – ERC 6551, wallet, chain/smart contract integration 
  • Social integration – most crypto ai agents have full authority to act autonomously on their Twitter/X accounts. 

What are some of the popular crypto AI agent frameworks? 

  • Eliza from ai 16z – OSS framework, Eliza is the number 1 trending github repo now.
  • Virtuals – This is closed source and it makes it very simple to create crypto AI agents. 
  • Zerepy from Zerebro – OSS framework 

There are a lot of new frameworks that have come up recently. 

For a more detailed comparison, please refer this comparison from Messari

How does Tokenization work with crypto AI agents?

Crypto AI agents follow the smart contract bonding curve approach for tokenization and users can buy and sell AI agent token like any other web3 token.  

How big is the crypto AI agent space? 

According to cookie.fun data, crypto AI agent space has a market cap of $12B with Virtuals alone having close to $4B market cap. Aixbt is 1 of the top AI agents that has a market cap of $550+ million dollars. 

Are there marketplaces for crypto AI agents?

Virtuals and Singularitynet provide crypto AI agent marketplaces. New ones are coming up fast.

What are risks associated with crypto AI agents? 

The power vested in crypto AI agents poses unique challenges:

Accountability: If an agent behaves maliciously, who is responsible—the agent or its creator? The traceability becomes even more complex with multiple agents.

Existential Risks: As AI agents approach AGI (Level 5), they could potentially challenge human value and control.

My Predictions

I see that Crypto AI agents have a lot of potential and following are my predictions:

  • AI agents will become a category like web apps/mobile apps and they will serve all different purposes. AI models and agents will evolve together.
  • Marketplaces for crypto AI agents and AI models will mature and users can pick and choose the AI agents for their needs like we choose from app store or play store. 
  • General AI agents and crypto AI agents will converge and token and wallet functionality will be an add-on on AI agents that need decentralization and commerce capability. 
  • Crypto AI agent frameworks will mature and there will be a good mix of open source and commercial AI agent frameworks. 
  • AI agents will start to have a standard interface for other agents to use them. 
  • Agent swarms—coordinated groups of agents—will become a focus of innovation.
  • Guardrails will come soon so that crypto AI agents are developed responsibly and there will be regulations to guide their usage. 
  • I feel that the crypto AI agent market has developed very fast and there will be a slowdown from the market cap perspective, associated technology will continue to evolve at a rapid speed. Why am I skeptical of the market cap? – Virtuals which has its own agent framework and marketplace has reached a market cap of $3.5B dollars in a 3 month time frame which is unprecedented…aixbt and truthterminal AI agents have a market cap of $600M dollars which I cannot still comprehend… I am overall very bullish on Crypto AI agents as a technology. 

References

Intersection of AI and Web3

Over the past year, AI has taken the world by storm, revolutionizing industries and reshaping technological landscapes. Having been deeply involved in the web3 domain for over two years, I’ve observed a fascinating overlap between these two transformative technologies. This blog explores how AI and blockchain complement each other: AI is opening up new possibilities for blockchain applications, while blockchain is providing the technological foundation to make AI more decentralized and secure.

To dive deeper, I’ll break this discussion into two sections:

AI helping blockchain

Simplifying Blockchain Transactions

AI agents are streamlining blockchain transactions, making them more user-friendly and efficient. For non-crypto users, navigating the complexities of wallets, tokens, and cross-chain interactions can be daunting. AI agents, with their autonomous nature, can handle these intricacies seamlessly. For instance, you can instruct an AI agent to buy a token when its price drops below a certain threshold. The agent can monitor the token’s price, execute the transaction, and even bridge it to the desired blockchain—all without requiring user intervention or knowledge of the underlying processes.

Attracting New Non-Crypto Users

AI agents are acting as a gateway for non-crypto users to engage with blockchain technology, thereby driving up transaction volumes. Chains like Solana and Base have seen a surge in daily transactions, thanks to the adoption of AI agents. Platforms like Virtuals and AI 16z, which serve as crypto AI agent frameworks on Base and Solana, exemplify this trend.

Automating Smart Contract Audits

AI is revolutionizing smart contract audits by automating vulnerability detection through techniques such as static code analysis, dynamic code analysis, and automated fuzzing. These tools enhance security while reducing manual effort. (Example: OpenZeppelin Defender)

Fraud Detection

AI can analyze suspicious blockchain transactions to detect and prevent scams like rug pulls and pump-and-dump schemes. (Example: Chain analysis)

General Generative AI Use Cases

In addition to blockchain-specific applications, generative AI use cases such as multimodal content creation, curation, and advanced data analysis are contributing to the overall ecosystem.

Blockchain helping AI

Data Provenance

Blockchain’s decentralized and immutable ledger is a powerful tool for data provenance, enabling the tracking of data inputs used in model training and ensuring the integrity of the data. By storing complete data histories on a blockchain, tampering can be prevented, and contributors to datasets can even be rewarded via smart contracts. (Example: Ocean Protocol)

Decentralized Learning

Blockchain supports decentralized or federated learning, where data remains distributed across nodes while models are trained collaboratively. This approach enhances data privacy and security. (Example: SingularityNet)

Deepfake Prevention

Blockchain can help verify AI-generated content by tracking associated data and inputs, mitigating the risks of deepfakes. (Examples: Numbers Protocol, CAI Initiative)

Tokenizing AI Agents

Blockchain enables the tokenization of AI agents, providing them with decentralized ownership, unique identities, and wallets for autonomous commerce. This capability empowers agents to transact, invest, and operate independently. (Examples: Virtuals, AI 16z, Zerebro)

Decentralized Physical Infrastructure (DEPIN)

Blockchain also powers decentralized physical infrastructures for AI training, optimizing the use of scarce resources like GPUs. Projects like Akash, Helium, and Filecoin are spearheading this space, offering decentralized solutions for compute, networking, and storage.

AI Compute Marketplaces

Building on DEPIN, AI compute marketplaces offer AI compute modules for model training and inference. These platforms provide a higher-level abstraction, making it easier to access decentralized AI resources. (Examples: Bittensor, NuNet, Hyperbolic Labs)

Conclusion

The intersection of AI and blockchain is creating a synergistic ecosystem, with each technology enhancing the other’s potential. While AI simplifies blockchain adoption and functionality, blockchain ensures AI is secure, decentralized, and transparent. As these technologies continue to mature, we can expect even more groundbreaking innovations at their crossroads.