• Tech Insights 2026 Week 8

    The past two weeks you have probably read about the very popular AI agent “Clawdbot” that was later renamed to “OpenClaw”. OpenClaw is a viral, open-source AI agent platform that runs locally on your computer, allowing it to control your machine, manage files, and interact via chat apps like WhatsApp and Slack.

    Looking at the number of user-assigned GitHub stars, the repository for openclaw is on its way to become one the most popular repos on GitHub ever, in absolute record time. Just look at the graph below (updated on February 14). The red line on the right is no border, it’s close to vertical growth.

    Last week Lex Friedman posted his interview with OpenClaw creator Peter Steinberger. The interview is long, over 3 hours, but if you have the time I really recommend you listen to it. We are truly witnessing history in the making here. Both Lex and Peter are very smart people, and the interview is packed with great insights in how AI agents are evolving and where we are going in this area. OpenClaw is built 100% prompted with OpenAI Codex, not a single line of code was created by hand.

    Here are some of my favorite quotes from the interview:

    “I always thought I liked coding, but really I like building. And whenever you don’t understand something, just ask. You have an infinitely patient answering machine that can explain you anything at any level of complexity.”

    “Consider how Codex or Claude sees your code base. They start a new session and they know nothing about your project. So you gotta help those agents a little bit… You have to learn the language of the agent a little bit, understand where they are good and where they need help.”

    Why did you win?
    “Because [all the other companies] all take themselves too serious. It’s hard to compete against someone who’s just there to have fun. I wanted it to be fun, I wanted it to be weird.”

    Thank you for being a Tech Insights subscriber!

    Listen to Tech Insights on Spotify: Tech Insights 2026 Week 8 on Spotify

    1. Google Releases Gemini 3 Deep Think with 84.6% ARC-AGI-2 Score
    2. OpenAI Introduces GPT-5.3-Codex-Spark for Real-Time Coding
    3. Seedance 2.0 Launches With Multimodal Video Generation
    4. MiniMax M2.5 Targets Cost Reduction for AI Agents
    5. GLM-5 Released as Open-Source MoE Model
    6. Alibaba Releases Qwen-Image-2.0
    7. Anthropic Raises $30B in Series G Funding
    8. UN General Assembly Approves 40-Member Global AI Scientific Panel

    Google Releases Gemini 3 Deep Think with 84.6% ARC-AGI-2 Score

    https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think

    The News:

    • Google released a major upgrade to Gemini 3 Deep Think, a specialized reasoning mode designed for scientific research, engineering problems, and complex mathematical tasks.
    • The model scored 84.6% on ARC-AGI-2, a visual reasoning benchmark designed to test novel problem-solving without relying on memorized training data. Humans average 60% on this test.
    • Deep Think achieved 48.4% on Humanity’s Last Exam without tools, a benchmark consisting of PhD-level questions across specialized fields.
    • The model reached 3455 Elo on Codeforces competitive programming challenges and achieved gold-medal level performance on the 2025 International Math Olympiad, Physics Olympiad, and Chemistry Olympiad.
    • Early testers include a Rutgers mathematician who used Deep Think to identify a subtle logical flaw in a peer-reviewed mathematics paper, and Duke University’s Wang Lab, which optimized crystal growth fabrication methods for semiconductor materials.
    • Google AI Ultra subscribers ($249.99/month) can access Deep Think in the Gemini app, and select researchers, engineers, and enterprises can apply for early API access.

    “We omitted GPT-5.2 (xhigh) results for CodeForces due to persistent CloudFlare timeouts via the OpenAI API” From Model Evaluation – Approach, Methodology & Results

    My take: It’s hard to overstate just how far we have come with these “next token prediction models”. On a visual problem-solving test where humans score on average 60%, this model scores 84.6%. We are quickly moving into superhuman intelligence with the latest AI models, which is both fascinating and scary at the same time. Pretty soon the difference between the best AI model and the smartest person alive will be greater than the difference between the dumbest person alive and the smartest person alive. Think about that for a while. How will that make you feel having to work with someone that is so much smarter than yourself on a daily basis? Gemini 3 Deep Think is only available through the Gemini app for Ultra subscribers at $250 per month indicating it’s an unoptimized model, but expect these improvements to trickle down to the next Gemini model release later in the spring.

    As for the benchmarks one of the more interesting things is that they did not test this model against GPT-5.2 (xhigh) for programming (Codeforces) due to “persistent CloudFlare timeouts”.

    Read more:

    OpenAI Introduces GPT-5.3-Codex-Spark for Real-Time Coding

    https://openai.com/index/introducing-gpt-5-3-codex-spark

    The News:

    • GPT-5.3-Codex-Spark is a smaller, speed-optimized version of GPT-5.3-Codex designed for real-time coding with over 1,000 tokens per second output, running on Cerebras Wafer Scale Engine 3 hardware.
    • The model launched February 12, 2026 as a research preview for ChatGPT Pro users in the Codex app, CLI, and VS Code extension, with limited API access for design partners.
    • Codex-Spark represents the first release from OpenAI’s multi-year, $10 billion partnership with Cerebras announced in January 2026.
    • The model features a 128k context window (compared to 400k for full GPT-5.3-Codex) and is text-only at launch.
    • OpenAI positions Spark as a “daily productivity driver” for rapid prototyping and quick edits, while the full GPT-5.3-Codex handles longer, complex tasks requiring deeper reasoning.
    • Usage is governed by separate rate limits that may adjust based on demand during the research preview period.

    My take: The keyword for 5.3-Codex-Spark is “real-time coding”, and this is new. Of course it’s not fully real-time, it’s more like you ask it to do something and it will finish that task in a second or two. However the feeling when you have an application running and you make nearly instant changes to it is very different from anything you have ever experienced in front of a computer. So if you have a Pro subscription I highly recommend you try it out. It gives you a glimpse of a possible future where software programs build and adapt themselves continuously based on your current demands. Think of computer games where enemy algorithms update continuously as they adapt to your play style.

    I am convinced that the time needed to write software is going to decrease rapidly in the next few years. Quality-wise current AI models already write as good code as we will ever need, we just need better steering and guidance. This means the next natural step is performance improvement, and I think we will see a lot of that in the next year. This in turn will affect how we see software in the future. Give the AI an URL to a CRM system and allow it to browse it. It will the create a working copy in minutes. The rapid quality and speed improvements of AI models is the biggest challenge most SaaS companies will face going forward.

    Seedance 2.0 Launches With Multimodal Video Generation

    https://seed.bytedance.com/en/blog/official-launch-of-seedance-2-0

    The News:

    • ByteDance launched Seedance 2.0 on February 10, 2026, an AI video generator that accepts text, images, videos, and audio as simultaneous inputs through a 12-file multimodal system.
    • The model generates 2K video at 30 fps with native dual-channel audio synchronized at the millisecond level, producing 4 to 15-second clips in approximately 60 seconds.
    • Users can reference up to 9 images, 3 video clips (15s max total), and 3 audio files (15s max total) using an “@mention” syntax to specify how each asset controls composition, motion, camera work, or rhythm.
    • The system maintains character consistency across multi-shot sequences, allowing one prompt to generate multiple coherent shots where faces and clothing remain stable across different camera angles.
    • Output includes automatically generated sound effects, background music, and lip-synced dialogue in six languages, with audio conditioning the diffusion process rather than being added post-generation.
    • The model supports video extension and editing operations such as character replacement, element insertion or removal, and narrative changes without full regeneration.
    • Camera replication capabilities include complex choreography, Hitchcock zooms, dolly shots, tracking movements, and one-take continuous sequences across multiple reference images.

    My take: Last week I reported on Kling 3.0 from Chinese Kling AI, and now a week later we have Seedance 2.0 from Chinese ByteDance. Both models generate up to 15 seconds at 2K resolution and both support native audio in multiple languages. So which model is the best? If you need the best visual quality and visual continuity, Seedance 2.0 seems to be the strongest. If you prioritize speed and motion, it’s Kling 3.0. That said if you have a minute, go watch the three video clips below. They are really impressive, then think about where we will be in one year from now.

    Read more:

    MiniMax M2.5 Targets Cost Reduction for AI Agents

    https://www.minimax.io/news/minimax-m25

    The News:

    • MiniMax M2.5 is an AI model trained through reinforcement learning in hundreds of thousands of real-world environments, designed to handle coding, agentic tool use, search, and office work tasks.
    • The model achieves 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp with context management.
    • M2.5 completes SWE-Bench Verified tasks 37% faster than its predecessor M2.1, consuming an average of 3.52 million tokens per task compared to 3.72 million for M2.1, with average runtime of 22.8 minutes.
    • Two versions are available: M2.5-Lightning at 100 tokens per second costs $0.30 per million input tokens and $2.40 per million output tokens, while M2.5 at 50 tokens per second costs half that rate.
    • The model was trained on over 10 programming languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby across more than 200,000 real-world environments.
    • MiniMax reports that M2.5 autonomously completes 30% of daily tasks within their company, with M2.5-generated code accounting for 80% of newly committed code.
    • The model demonstrates “Spec-writing tendency”, decomposing and planning project features, structure, and UI design before writing code.

    My take: This model looks like a great candidate for agentic workflows. Very high scores at benchmarks like SWE-Bench, 1.5x-3x faster than Opus 4.6, and 33x cheaper on input tokens and 42x cheaper on output tokens compared to Opus 4.6. MiniMax M2.5 shows two things: first, models with the performance of Opus 4.6 will probably drop in price significantly in the next few months, and secondly it’s possible to get this kind of performance with just 230 billion parameters.

    Read more:

    GLM-5 Released as Open-Source MoE Model

    https://glm5.net

    The News:

    • Z.ai released GLM-5 on February 11, 2026, an open-source model under MIT license with 744 billion parameters (40 billion active per token) designed for complex systems engineering and long-horizon agentic tasks.
    • The model expands from GLM-4.5’s 355B parameters (32B active) with pre-training data increasing from 23T to 28.5T tokens.
    • GLM-5 scored 77.8 on SWE-bench Verified and 56.2 on Terminal Bench 2.0, ranking as the highest-scoring open-weight model on these benchmarks and surpassing Gemini 3.0 Pro in overall performance.
    • The model scored 50 on the Artificial Analysis Intelligence Index and 63 on the Agentic Index, ranking as the leading open-weights model.
    • GLM-5 uses an asynchronous reinforcement learning framework called “slime” that allows multiple training attempts to run independently rather than waiting on the slowest task.
    • The model integrates DeepSeek Sparse Attention and supports a 200K token context window with 128K maximum output.
    • GLM-5 was trained entirely on Huawei chips.

    “In a strategically significant move, GLM-5 has been trained entirely on Huawei Ascend chips using the MindSpore framework, achieving full independence from US-manufactured hardware.”

    My take: According to Z.ai this model should be on-par with Opus 4.5 (the previous version) but user feedback for it has not been that good. Like MiniMax M2.5 above it’s a MoE architecture with only 40B parameters activated. Now what makes this model truly interesting is not its performance (or lack of it) but the fact that it is 100% trained on Huawei chips. According to Z.ai “This positions GLM-5 not only as a technical achievement but as a milestone in China’s drive toward self-reliant AI infrastructure” and maybe they are right. I still haven’t seen anything from Huawei that comes close to NVIDIAs Vera Rubin though.

    Read more:

    Alibaba Releases Qwen-Image-2.0

    https://qwen.ai/blog?id=qwen-image-2.0

    The News:

    • Qwen-Image-2.0 launched February 10, 2026 as a 7B parameter model that combines text-to-image generation and image editing in one architecture, down from 20B parameters in version 1.0.
    • The model supports prompts up to 1,000 tokens and generates images at native 2048×2048 resolution without upscaling.
    • Professional typography rendering handles dense text blocks, multi-panel layouts, and mixed-language content across Chinese and English, addressing placement in structured formats like slides, posters, comics, and infographics.
    • The model achieves GenEval score of 0.91 and DPG-Bench score of 88.32, outperforming FLUX.1 at 83.84 and GPT Image 1 at 85.15.
    • Qwen-Image-2.0 tops AI Arena ELO leaderboard for text-to-image generation and ranks second for image editing, behind Google’s Nano Banana Pro and ahead of ByteDance’s Seedream 4.5.

    My take: Just looking at the images on the web page this model looks extremely close to Google Nano Banana in performance. Alibaba managed to achieve this with progressive training: starting with non-text rendering, moved to simple text, then gradually scaled to paragraph-level descriptions. I am really impressed with this model, especially considering the size of it. If you want to try it you can try it for free at qwen chat.

    Read more:

    Anthropic Raises $30B in Series G Funding

    https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation

    The News:

    • Anthropic closed a $30 billion Series G funding round on February 11, 2026, valuing the company at $380 billion post-money.
    • The round was led by Singapore sovereign wealth fund GIC and Coatue, with co-leadership from D.E. Shaw Ventures, Founders Fund, ICONIQ, and MGX.
    • Additional investors include Accel, General Catalyst, Jane Street, Qatar Investment Authority, Goldman Sachs Alternatives, JPMorganChase, Lightspeed, Menlo Ventures, Morgan Stanley, Sequoia Capital, and Temasek.
    • The funding includes portions of previously announced investments from Microsoft and NVIDIA, which committed up to $5 billion and $10 billion respectively in November 2025.
    • This marks the second-largest venture funding round in history after OpenAI’s $40 billion raise in 2025, and represents more than double Anthropic’s September 2025 Series F valuation of $183 billion.
    • The funds will support frontier research, product development, and infrastructure expansions, with Anthropic positioning itself as the market leader in enterprise AI and coding.
    • Anthropic CFO Krishna Rao stated, “Claude is increasingly becoming more critical to how businesses work”, noting that the fundraising reflects customer demand.

    “A huge part of this raise is Claude Code. Weekly active users doubled since January. People who’ve never written a line of code are building with it. Humbled to work on this every day with our team.” Boris Cherny, Claude Code creator.

    My take: In September 2025 Anthropic was valued at $183 billion. Now they are valued at $380 billion. The main reason for the massive increase according to Boris Cherny at Anthropic is programming. Claude Code is quickly becoming the de-facto standard for many companies as their main AI programming tool, and if you consider that every programmer in the world will work with agentic engineering within a few years, it’s easy to understand this valuation. Every company in the world will depend on products like this.

    OpenAI still has no team plan that allows decent company access to Codex. The current Team or Enterprise plan is only enough for light prototyping (slower and much less tokens available per week), and very few companies allow individuals to purchase the individual ChatGPT Pro plan which is required for heavy Codex usage. OpenAI really needs to push out a Premium teams license that allows for more Codex usage fairly quickly, otherwise it will be Claude Code that becomes the #1 choice at most software development companies in 2026.

    Read more:

    UN General Assembly Approves 40-Member Global AI Scientific Panel

    https://press.un.org/en/2026/ga12751.doc.htm

    The News:

    • The UN General Assembly approved the Independent International Scientific Panel on Artificial Intelligence on February 11, 2026, with a vote of 117-2. The United States and Paraguay opposed the measure, while Tunisia and Ukraine abstained.
    • The 40 panel members were selected from more than 2,600 candidates through an independent review process led by the International Telecommunication Union, the UN Office for Digital and Emerging Technologies, and UNESCO. Members serve three-year terms in personal capacities.
    • Panel members include AI researcher Yoshua Bengio, Joelle Barral of Google DeepMind, Filipino journalist and 2021 Nobel Peace Prize laureate Maria Ressa, two American experts (Vipin Kumar from University of Minnesota and Martha Palmer from University of Colorado), and two Chinese specialists (Song Haitao from Shanghai Jiao Tong University and Wang Jian from the Chinese Academy of Engineering).
    • The panel issues one annual policy-relevant summary report synthesizing research on AI’s opportunities, risks, and impacts, and provides updates to the General Assembly twice yearly through interactive dialogues with its co-chairs.
    • The panel functions as an independent scientific body designed to bridge the AI knowledge gap and assess economic, ethical, and social implications of artificial intelligence across all member states regardless of their technological capacity.
    • UN Secretary-General António Guterres stated “In a world where AI is racing ahead, this panel will provide what’s been missing: rigorous, independent scientific insight that enables all member states, regardless of their technological capacity, to engage on an equal footing.”.
    • The US representative Lauren Lovelace called the panel “a significant overreach of the UN’s mandate and competence” and stated “We will not cede authority over AI to international bodies that may be influenced by authoritarian regimes seeking to impose their vision of controlled surveillance societies”.

    “We will not cede authority over AI to international bodies that may be influenced by authoritarian regimes seeking to impose their vision of controlled surveillance societies”. Lauren Lovelace

    My take: 117 countries voted in favor of this panel, 2 countries against: US and Paraguay. The main critique from the US is that there is a risk that China, Russia, and other countries on the panel could shape the narrative around AI risks and opportunities in ways that conflict with US interests.

    Other critique against this panel is that the representatives only meet up once per year, where AI evolve monthly. A year ago Claude Sonnet 3.5 could generate a few hundred lines of simple code at best, today we have tools like Codex and Claude Code that can build some of the most complex applications you can think of. By the time this UN panel meet up next time in 2027 the entire area will have rapidly so quickly that many topics will no longer be relevant.

    That said, I believe there is a benefit of this panel for countries that lack the technical capacity to independently assess AI risks, opportunities, and economic impacts by themselves. For countries leading the development of AI the benefits of this panel are maybe less clear, especially with both Chinese and American experts on the same panel.