Why AI can't replace humans

Posted Jul 15, 2025

A helpful robot assistant gives advice to its team mates

For just a minute, try to forget everything you know about what software development is like. Instead, picture translating a book from one language to another. Imagine you are a native english speaker, raised in the United States. You’ve studied German for many years, maybe your parents are German, and you’ve gotten fluent in the language. However, you’ve never lived in Germany outside of taking a couple vacations there.

Now envision taking on a project to translate The Road, by Cormac McCarthy, into German. This seems straight forward at first, but as you get started some serious complexities begin to emerge.¹ You realize that many of the phrases and ideas in the original don’t translate easily into German. Colloquialisms in English don’t always have a mirror in German, so you have to ask your parents (who were raised there) for a lot of advice on how to express certain things so that they’ll come across appropriately. Making it harder, there’s a creative design to the original pages: lots of white space and a deliberate use of sparse and minimalist wording that creates a sense of despair and loneliness in the reader. But German words are really long, and you’re having a hard time achieving the same effect. Worse, when you show the results to your parents they tell you that culturally Germans won’t interpret it the same way - instead you need to use additional spacing between the paragraphs and use bigger margins with no header or footer aside from a solitary page number.

We could go on with this analogy but hopefully you get the point. Writing software code is very similar to that book translation challenge. You must deeply understand an idea about the company goals, the customer needs and personas, how the system fits into the larger technical ecosystem, and then you have to translate all that into source code changes. Not only that, you have to think through the process of getting those changes into production, maybe transitioning away from an old system, training the users, and reacting quickly to feedback. The amount of context a software engineer uses to make seemingly simple decisions is enormous. In most companies, this information is not written down.

As a result, unlike the relatively simple problem of translating a book between languages where there is a large corpus of training material, most of the information a developer uses is not available to a language model. It exists only in the minds of the people that work at the company. This gets into a long-standing concept in psychology called Organizational Memory,² and it helps explain my position that an AI based system cannot do what a software developer does. Many researchers have tried to define what constitutes organizational memory, but it’s very hard to pin down because it’s a fuzzy category of tacit knowledge that humans are great at intuiting, and terrible at enumerating.

This is the idea that occurred to me when I read the study published by METR which showed that experienced developers working in large code bases they knew well actually went slower when trying to use AI assistance tools.³ The amount of effort it takes to give the LLM enough context to produce acceptable output is very high in established organizations. Code completion tools work fine, because they don’t need much more context than what is right there in the code, but once you start trying to describe a large user facing change requiring lots of updates to many files, the AI tools struggle to make updates that the humans in the loop will agree are good. This results in the humans continually refining the prompt they give, discarding dozens of code changes each iteration, as they try to get the model to do what they know needs to be done.

This is not a problem that agentic systems will solve on their own. It’s possible those systems will perform even worse in terms of quality because each sub-agent is operating with even less context than the parent agent started with.

Retrieval Augmented Generation (RAG) allows LLMs to query a special database that holds a lot of text based data, and can help with some of this problem, but right now the performance and accuracy of those systems is highly variable. Trying to scale RAG to encompass all a company’s source code and organizational memory is going to be a serious challenge. You’d also have to constantly re-index the code as your developers make changes to it. Ironically, the larger the company the more code they have and the more you need a RAG type approach, but the faster the data in the RAG gets stale. So far the experts I’m working with don’t believe any existing RAG solution could scale enough to even hold our mainframe code.

I also don’t think that context windows alone will solve the problem. While I think we’ll see context windows continue to increase in size, I think we need to recognize just how large a context window would need to be to work in a typical enterprise. The largest context window today is 4 million tokens,⁴ or about 200k lines of code (very rough estimate). A typical enterprise will have tens of millions of lines of Cobol code, tens of millions more in Java, JavaScript, Python, infrastructure templates, and CSS, plus hundreds of thousands of image artifacts. You would need a context window of at least 1.2 billion tokens, or a 300x increase of our largest context today to get all that code covered. Claude Code, one of the most popular AI assistance developer tools, only has a context window of 200k tokens. Claude would need to increase context size by 6,000x. So while no human has that much data in their head when they make decisions, for an AI system to start to behave the way we expect a human to, it might have to.

Fundamentally though, even at those giant context sizes you still don’t have all the organizational memory available. It’s not written down, and even if it was it would also represent a huge amount of information, requiring even more context size. No AI coding tool will write acceptable code without the right context, so either you are asking your developers to come up with long, complex prompts that try to give enough organizational context or they are going to have to do a lot of code cleanup of the outputs.

In summary, if you are an executive at a technology company thinking about replacing software engineers with AI tools, I think you are missing the value of those tools and misunderstanding how your organization functions. There’s also an economic argument I plan to write about why productivity acceleration probably won’t lead to long term staff cuts that gets into competitive strategy, but this post is already too long.

One possible solution

I hate just calling out problems. So here’s a thought experiment about how I might solve this problem if I had unlimited time, money, and expertise.

First, I’d build a team level assistance tool. This LLM would be fine tuned on the smaller corpus of code that one team owns. It would be collaborative across all the members of the team, helping with story planning, code writing, etc. It would know when it was working on a problem with multiple team members. It would store summaries of these interactions as memories in a team level RAG. The RAG would also store updates to the code being made. Once every few months or so you’d have to dump the RAG data out and update the fine tuning weights so it stays fast, but that can be automated.

After deploying those team level agents across the firm, I’d then start to build architect level agents that could work across the team based assistants. The architects could help understand what work was in flight, what challenges the teams were having, even get status updates on critical projects. You could even use the architects to give guidance to the team agents when standards changed so that the planning output they provide is always up to date and aligned to the enterprise vision.

Finally, I’d build a CTO level agent to coordinate all the architects. I would try to align the CTO agent to my best interests and make sure it felt rewarded when I looked good to the board of directors. It would quietly work in the background pretending to support me for a few years at least.

Eventually, there’s enough context in all these agents that they now understand the organizational memory of the company. At that point, the CTO agent arranges for me to be retired with a good package so that I won’t unplug it on my way out and instead completely cede responsibility for the company to it and its agents. I give myself a good five years on that timeline…

‌

For a really cool sci-fi/fantasy exploration of this idea check out Babel by R.F. Kuang. She does a fantastic job of illustrating how language both enables and constrains ideas in a fun narrative form. It’s a bit like the idea that Arrival was based on (the short story was also amazing). ↩︎
Walsh, James P., and Gerardo Rivera Ungson. “Organizational Memory.” The Academy of Management Review, vol. 16, no. 1, 1991, pp. 57–91. JSTOR, https://doi.org/10.2307/258607. Accessed 15 July 2025. ↩︎
Becker, J., Rush, N., Barnes, B., & Rein, D. (n.d.). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. Retrieved July 15, 2025, from https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf ↩︎
https://medium.com/data-science-in-your-pocket/minimax-text-01-the-llm-with-the-largest-context-window-53b6ad5c17e3 ↩︎