The Last Few Years of AI Coding

This is a blog post written for my future self to look back on. There are two reasons for saying this: first, looking back from the future at how rapidly a technology has developed is quite interesting^*. Second, uh… these days everyone is chatting with AI, and hardly anyone reads personal blogs anymore (maybe AI will read them…).

* Just like how this site documented in 2013 the “arduous” journey of deploying HTTPS, which has now become completely routine.

As everyone knows, the AI boom began with the release of ChatGPT at the end of 2022. However, this article will start a little earlier.

June 2021: A Useful Code Completion Tool

GitHub Copilot in 2021 was a code completion tool. For example, when a user typed DB::, the AI would “guess” based on context which database the user wanted to query and what data they needed, then present the complete database query code for the user to accept with a simple Tab press.

The Copilot beta invitation email I received on October 27, 2021 (after waiting for months…)

Back then, you could see users who got early access to Copilot sharing their experiences on Twitter. Some discovered that Copilot would generate code containing hidden bugs, while others questioned whether the code it produced closely matched copyrighted public code, raising potential legal issues. Interestingly, people found that Copilot seemed capable of doing just about anything.

Of course Copilot can complete code (Source)

Copilot can also be used to complete translations (Source)

…or even trick Copilot into answering questions? (Source)

For someone like me who used to have dozens of browser tabs open just to write a piece of code, Copilot was incredibly useful. Beyond saving me from the repetitive work of writing CRUD operations, it also helped with algorithms I wasn’t familiar with or API definitions I didn’t know. Now, I could simply write a comment and let Copilot “guess” the implementation. Coding became a much more relaxed process, and my efficiency improved significantly.

So how was this achieved? In reality, GitHub Copilot used a large language model (LLM) to complete code based on the surrounding context.

The Transformer-based LLM used by Copilot can be thought of as a highly sophisticated “text completion” machine. It doesn’t possess human-like consciousness, but by pre-training on massive amounts of text and code, it has learned statistical patterns between words and code logic. When given a sequence of text, the model’s neural network performs enormous matrix calculations to compute the probability of each token in the vocabulary appearing next, then selects the most likely one, and repeats the process.

By the way, the GitHub Copilot at that time used OpenAI’s Codex model (not the one from 2025). And that Codex model was fine-tuned from GPT-3.

Hmm? GPT?

Late 2022: The Explosive Rise of the AI Chat Website

About a year and a half after GitHub Copilot’s public release (late 2022), OpenAI decided to spend a few weeks turning their GPT-3.5 model into a simple AI Q&A website — ChatGPT.

The principle was straightforward: just like Copilot used GPT-3 to “guess” the next piece of code, ChatGPT used GPT-3.5 to “guess” the next piece of text. This time, however, the user’s message was placed into a special prompt template (along with some ChatGPT-specific fine-tuning):

Example conversation prompt from GPT-3 docs (the template used by GPT-3.5 may differ slightly)

Thus, anyone could visit a website and chat directly with GPT-3.5. A model that had previously been popular only in programmer circles suddenly exploded globally. In just two months, ChatGPT reached 100 million users.

However, just as code completion AI could generate buggy code, LLM answers could also be wrong. In niche domains or on detailed questions not well-covered in the training data, LLMs often confidently fabricated completely nonexistent things.

Returning to programming: the same GPT-3 LLM that could complete code a year earlier naturally could also write code inside ChatGPT:

People discovered ChatGPT was very good at writing code (Source)

That said, while it could generate code snippets and suggest solutions for errors, the “snippets” might be incorrect and the “solutions” might be entirely made up.

People also discovered ChatGPT could write wrong code (Source)

Remember: an LLM is merely a sophisticated “text completion” machine. It doesn’t actually understand what the user is saying; it only predicts the most probable next token based on context. When asked about unfamiliar topics, it will very likely start hallucinating based on patterns in its training data.

So, is there a way to make AI behave more like a human, such as actively searching the internet when it encounters unfamiliar territory?

Early 2023: LLMs Start Using Tools

As mentioned earlier, ChatGPT gained 100 million users in just two months. This was terrifying! For Google, it threatened their search engine traffic; for Microsoft Bing, it was a rare chance to fight back (spoiler: it didn’t really work out).

Early 2023: Bing Chat gained search capabilities

So companies started pouring resources into AI. For ChatGPT users, the first few months of 2023 brought almost monthly new features. One of the most important was that it could now “search” the internet. Or, more precisely, the LLM learned how to call tools.

How does an LLM, which is essentially a “text completion” machine, call tools? Very simply: the system includes instructions in the prompt telling the LLM how to issue tool calls (e.g. { "tool": "search", "query": "weather" }). When the LLM generates such a tool call command, generation is interrupted. The server then executes the requested tool (e.g. web search), inserts the result back into the conversation, and continues generation.

Great, AI can now call tools to search. But then what?

September 2024: LLMs Start Thinking (Reasoning)

In September 2024, OpenAI released the o1 model. It was trained to “waste” a large number of tokens before answering — performing self-reflection and chain-of-thought reasoning — and only then present the final answer to the user. This type of model is called a reasoning model, but it is still an LLM.

This feels a bit like a “cheat” to improve model performance: If the LLM can’t get it right the first time, just let it think multiple times and self-correct. But for other companies, once someone else did it, they couldn’t afford to fall behind, so reasoning models started appearing everywhere.

Note that what users see as the “thinking process” from closed-source models is not the raw chain-of-thought, but a summary. This is partly because the chain-of-thought is hard to read, and partly because the raw chain-of-thought could be extremely valuable training data that competitors could use.

The thought process from Gemini 3 Pro has already been summarised

If you want to see the raw thinking process, you can use open-source reasoning models. I’ve also personally encountered cases where closed-source models or their systems glitched and accidentally leaked the raw chain-of-thought:

Sometimes Gemini 3 Pro leaks its complex raw chain-of-thought

2025: Coding Agents Changed Programming

By 2025, LLMs can chat, use tools, and have learned to reason.

What happens if we give an LLM a set of special tools that allow it to operate on the user’s computer (search/browse/modify files, run commands, etc.)? In 2025, Coding Agents equipped with all these capabilities began to appear everywhere.

Now, I only need to install any Coding Agent (Cursor, Windsurf, Antigravity, etc.), open my project in the IDE, tell the LLM what I want, and it will automatically find relevant code, look up documentation online, generate code, modify files, run tests, analyze errors, fix bugs, improve the code, and update documentation — all by itself.

I gave Claude a requirement in Antigravity and it finished it very quickly

So now I can sip coffee, lean back in my chair, and comfortably watch the AI write code and debug for me. When it finishes a task, I just need to review and approve the changes (or even set it to auto-approve), do a quick acceptance check, and then give it the next requirement.

I found myself much more willing to experiment with things I wouldn’t have dared to try before. When thinking of new features, I can just let the Agent spend a few minutes implementing a rough version, and if it doesn’t work, I click Reject. When learning a new programming language, I no longer need to constantly look up documentation or hunt for best practices; the AI can write code directly according to my requirements, and I can ask it questions when I don’t understand.

This has also given birth to a new development style: Vibe Coding. Since AI can understand code and fulfill requirements, why should humans still learn programming? Just tell the AI what you want, you only do user testing — reject if unsatisfied, accept and move to the next requirement. Repeat forever. Humans don’t even need to write a single line of code! What could possibly go wrong?

AI: Since authentication keeps failing, let’s just remove authentication! (Source)

But there’s no denying that Coding Agents have dramatically increased software development efficiency. One can’t help but sigh: programmers have evolved from ~~code porters~~ to ~~code curators~~.

The Future

Will AI replace programmers in the future?

This isn’t a black-and-white question: AI will replace programmers in the sense that higher development efficiency means fewer people are needed; but AI will not completely replace programmers, because even with reasoning and tool use, LLMs remain fundamentally stochastic text generators that frequently make subtle, hard-to-spot mistakes. This requires programmers who understand architecture and code to guide and correct them.

However, one thing is certain: AI programming has given humans many more ways to be lazy, and humans are naturally inclined toward laziness. For large projects, if people become lazy and stop carefully reviewing AI-generated code, a few occasional mistakes might be tolerable, but over time the accumulated errors will inevitably cause the project to spiral out of control.

In my view, 2025 was the year when software quality dropped sharply. The three major cloud providers each had at least one major outage, not to mention Windows 11 being filled with bugs everywhere. Even Chrome, which used to be fast and stable, started showing all kinds of big and small issues.

The utterly baffling Windows 11 network login screen layout

Ten years from now, will AI programming still be based on LLMs? Will Windows still be a buggy mess?

Coxxs

This article (https://dev.moe/en/3334) is an original work by Coxxs. Unauthorized reproduction is prohibited.