The Industrialization of Intelligence
From Software to Digital Labor
Introduction
1Analysis of the AI industry seems to persistently oscillate between frameworks that see it as a high-capex chatbot platform or as “humanity’s last invention” in the form of AGI (artificial general intelligence) or ASI (artificial superintelligence). The former framework has the advantage of aligning with modern tech industry rubrics (from SaaS and other API business models), but I also believe it tends to devolve into “chatbot myopia” and often misses the appropriate unit of economic analysis for AI. The latter, superintelligence view is more exciting but also somewhat ethereal since AGI and ASI are still very poorly defined, shifting any business analysis away from hard facts and toward unmeasurable metaphysical debates.
In this article, I argue that AI is not merely another software wave. Instead, I view modern AI as a capital-intensive industrial engine with the potential to produce digital labor that can reshape global economies. To analyze this revolution, this article leverages a labor lens that accounts for the “jagged frontier” of capabilities and the implementation gaps inherent in adopting agentic architectures. From the architecture of autonomy to the flywheel of “digital workers,” the following sections outline some of the critical vectors shaping the future of AI.
Escaping Chatbot Myopia: From Software to Digital Labor. The AI industry’s dominant analytical framework often mistakes the chatbot for the destination rather than the on-ramp. This article reframes AI not as another software category, but as the emergence of digital labor. This represents a shift with economic implications closer to industrial revolutions than software waves.
The Algorithm of Statecraft: The Superpower AI Race. AI doesn’t just have the potential to transform the lives of consumers and the competitive landscape for enterprises, it also has the potential to impact the scientific, economic and military standings between nation states. When the competitiveness of nation-states, particularly superpowers, are at stake, government influence on a rising technology can drastically change the pace of improvement and societal pervasiveness.
Agentic AI’s Challenge: Turning “Intelligence” into Execution. LLMs excel at natural language and interpreting ambiguity, but businesses and consumers demand deterministic precision. This article explores why agentic autonomy emerges from architecture, not model capabilities alone, and how agentic AI reconciles creativity with control.
The Jagged Frontier and the Hidden Shape of AI Progress. The Jagged Frontier explains why AI can outperform a PhD on one task and fail a grade-school logic puzzle on another. This uneven capability profile is not anecdotal noise, it’s a structural feature of probabilistic models and perhaps the single most important concept for understanding AI’s economic impact.
Tokens, Labor and the Agentic Flywheel. This article argues token deflation is not a bug or a worrying sign of commoditization, it’s the growth engine. Rapid declines in inference costs expand the feasible task space for agents, accelerating the transition from AI as a tool to AI as an autonomous economic actor.
Sources of Competitive Advantage in AI. While much of the analysis of the AI industry is centered on the competitive differentiation between the frontier model labs, this article argues the primary sources of competitive advantage are forming higher up the stack. Agentic architectures represent the key mechanism here as they capture proprietary context, “learn on the job”, and integrate deeply into enterprise and consumer workflows.
From App Stores to Intent Markets: AI’s Impact on Apps. Icons, menus and dashboards are artifacts of human bottlenecks, not optimal compute interfaces. As agents take over task execution, the winning products will be those optimized for machine consumption, not human attention.
From Power Laws to Phase Transitions: Rethinking AI’s Progress. Benchmarks alone cannot explain why AI demands ever-larger investments in compute and energy. For finance and business leaders, the key question is not whether models improve, but whether the mechanism of improvement itself is changing.
LLM’s Are Dominant, But Not Alone. The excitement around LLM scaling has not resolved older academic disagreements about how intelligence should be constructed. Instead, those debates have re-entered the spotlight as enterprises push AI beyond conversation and into autonomous execution.
A Conceptual Test-Drive of Agentic: Autonomous Vehicles. This article argues autonomous and semi-autonomous vehicles represent some of the most advanced real-world deployments of agentic AI today. And the industry is operating under safety, latency, and reliability constraints that enterprise and consumer software has yet to face.
Escaping Chatbot Myopia: From Software to Digital Labor
The evolution of the AI industry is still nascent. Consequently, many business leaders are still searching for a framework to understand its impact. It is natural that most attempt to shoehorn AI into the shape of the PC, internet, or cloud revolutions—primarily because these are the shifts many of today’s executives have experienced firsthand. Some analysts even argue this is “just software,” implying it is ultimately a subset of the SaaS market.
This article argues this view is a result of chatbot myopia: a fallacy where the first widely adopted application (the chatbot) is mistaken for the end state of the technology. This view can be dangerous because the conversational chatbot was merely a steppingstone to agentic AI, and agents will represent the primary application space for the industry. While frontier labs frequently discuss AGI (Artificial General Intelligence) and ASI (Artificial Super Intelligence), these goals are not mutually exclusive from agents; indeed, “Agents” are reportedly defined as Level 3 in OpenAI’s progression toward AGI (Figure 1).
Figure 1: OpenAI’s Levels of AI, 20242
First principles of AI agents
If you search for news articles on agents, you will likely spend hours wading through conflicting definitions. Is it the tool Perplexity uses to interact with websites? Is a Nest thermostat an agent? Is it software that runs a corporation? While all of these fit a broad technical definition, we must home in on the definition relevant to the economic application of modern AI.
To do so, we can define an agent as a system that operates in a continuous loop of three stages3:
Perceives: It senses the environment (via data, text, or visual inputs).
Decides: It reasons about what to do next to achieve a specific goal.
Acts: It executes an action that has a real-world impact (e.g., sending an email, buying inventory, or pushing code).
The most important takeaway from this definition is the sharp contrast with early chatbots. When ChatGPT launched in 2022, it was a passive oracle. It could answer questions based on training data, but it could not browse the live web (Perceive), it could not pause to reason through a plan (Decide), and it could not execute tools (Act). Since then, chatbots have taken on many of these capabilities as they have become more agentic.
The Three Levels of Agent Capabilities
Industry analysis is further complicated by the use of the term “agent” to describe systems with vastly different capabilities. To understand the labor implications, we can unpack this into three distinct levels of capability4.
Level 1: The Reflex Agent (Model-Based) This agent leverages an LLM to understand user intent and route it to the appropriate tool. It manages the “control flow” of traditional applications, replacing the pointing and clicking humans usually do.
The Labor Impact: Low. It potentially creates a smoother UI (often called headless apps), and it is an “AI workflow”, but it is still fundamentally a software tool, not a labor substitution. This represents the tool and copilot stage of agents.
Level 2: The Goal-Based Agent (YOU ARE HERE) We are currently entering this era. These agents do not just follow a route; they are given a high-level goal and must form a plan to achieve it. The agent breaks the goal into steps, executes them, and—crucially—observes the result. If the first attempt fails, a Level 2 agent loops back, adjusts its plan, and tries again.
The Labor Impact: High. Task Length becomes the key metric at this stage. As these agents become capable of handling longer, multi-step tasks without human intervention, they begin to substitute for junior-level cognitive labor in workflows like data analysis, basic coding, and research.
Level 3: The Learning Agent (The Gold Standard) This is the future state for enterprise and consumer AI. While Level 2 agents can plan, they generally reset after the task is done. A Level 3 agent possesses episodic memory and learns from experience. If you correct a Level 3 agent on Monday, it incorporates that feedback into its behavior on Tuesday.
The Labor Impact: Transformative. In the enterprise, this is an agent that “learns on the job,” increasing its economic value over time and creating a defensible moat of proprietary process knowledge (i.e., context).
The myopic chatbot lens, mistaking the mechanism for utility
The explosive success of ChatGPT in late 2022 led many to see chatbots as the primary application of AI. In contrast, we believe the primary application space will ultimately be agents acting as an enhancement or substitute for labor. The myopia of seeing the first primitive application as the end state of the technology’s evolution is an extremely common hurdle that plagues the analysis of early technological revolutions:
The steam engine is just a water pump. The first steam engines were utilized for pumping water from mines in the early 1700s5. Early views of the industry potential as just a water pump completely dismissed the primary transformative effect of steam as revolutionary technology for the transportation and manufacturing sectors.
Electricity as the electric candle market. The first widespread commercial application was as a replacement for gas lighting. As a result, an early industry metric was the “cost per lumen.” This metric faded as it became more apparent electricity was a general-purpose technology that would revolutionize industrial and consumer life and eventually enabled the rise of the modern information technology industry.
Cell phones as telephones without cords. In 1980, AT&T hired McKinsey to forecast the size of the cellular phone market in 2000. The consulting firm came back with an estimate of 900,000 subscribers6. By 2000, the actual number was 109 million subscribers in the US. The cell phone eventually became much more than a telephone and eventually evolved into a personal communication and computing device.
An internet is as significant as the fax machine. In 1998, Nobel laureate Paul Krugman issued his now-famous forecast about the internet: “By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine’s”.7
The chatbot is an amazing application, but it is just one component of the burgeoning AI ecosystem. By basing AI frameworks on this early component, industry participants risk missing the forest for the trees. Indeed, building industry analysis from chatbot economics (chatbot DAUs, paid vs premium, user retention, etc.) is useful for understanding this particular application, but it does very little to explain the ultimate economic impact of AI.
The labor lens: agentic as the primary application space for AI
Perhaps more importantly, I also believe it is likely that much of the non-AI software application space will become “agentified” over time; this occurs as agents slowly displace traditional UIs and business logic, becoming the interface through which we use all other software.
“AI agents will become the primary way we interact with computers in the future. They will be able to understand our needs and preferences, and proactively help us with tasks and decision making”
—Satya Nadella
To unpack this, let’s think through the definition of traditional software. Software is a tool to augment human actions and to accomplish things in the real world. For much of the history of the computer industry, this has held true with each revolution bringing new software capabilities and usually a new UI. This led to widespread use of software in ways that could hardly be imagined a generation ago, and it is best embodied in Marc Andreessen’s famous 2011 quote: “software is eating the world”8. In his famous essay, Andreessen notes that the decades-long evolution of the computer industry has produced “a global economy that for the first time will be digitally wired.” With software as a tool, humans can augment everything from ordering transportation to analyzing real-time business performance.
In my opinion, modern AI is not a continuation of this trend; it’s a disruptive event leading to a new path. AI enables agents, leveraging foundation models for intent and reasoning, to capitalize on this global digital wiring to produce digital labor. The potential of agentic AI is not to create another tool, it’s to become a worker itself. Working alongside humans, agents potentially offer a far greater productivity impact than software ever could. Alex Rampell at a16z recently explored this concept to extend, or expand upon a subset of, the original Andreessen essay by explaining how “software is eating labor.9”
While modern generative AI was first introduced as a chatbot. Early chatbots were a passive way to retrieve information from LLMs. With the introduction of reasoning models in 2024, this passive use case began to enter the realm of agentic AI as it began to use tools and planning (although early tool use was exemplified by chatbot plugins in 2023). This led to the first widely used, early form of agents, deep research. We’re now moving beyond this era with early implementations of basic agents for consumer, business and government tasks.
Table 1: Chatbot Versus Labor Lens of AI
When focusing on the labor lens, I believe the evolution of AI has more commonality with prior industrial cycles, and more specifically, that of general-purpose technologies. The ultimate TAM is the global labor market, and a framework that recognizes this will best serve the interests of those monitoring the industry’s progress. Any framework should also be unbiased and balanced, so bears and bulls can utilize it freely. I dive into this framework throughout this article.
The Algorithm of Statecraft: The Superpower AI Race
AI doesn’t just have the potential to transform the lives of consumers and the competitive landscape for enterprises, it also has the potential to impact the scientific, economic, and military standings between nation-states. Just as the US was the key source of demand early in the microelectronics industry’s development (e.g., Space Race, Cold War, Vietnam War), it was also the core driver of industrial modernization during World War II (e.g., The Willow Run Miracle). When the competitiveness of nation-states, particularly superpowers, is at stake, government influence on a rising technology can drastically change the pace of improvement and societal pervasiveness.
“If AI surpasses human intelligence and acquires the ability to improve itself, it could confer an unshakable scientific, economic and military superiority on the country that controls it.”
—The Wall Street Journal, November 10, 2025
In their 1995 analysis of general-purpose technologies, Bresnahan and Trajtenberg note the importance of “…exogenous forces that shift the rate of return to GPT [general purpose technology] technology…the onset of the Cold War resulted in a government procurement policy which may have played a similar role”10. As a result, any AI analysis should consider the degree to which global governments will provide a “sovereign trigger” that changes the pace and degree of AI adoption.
In this section, I attempt to provide a framework for understanding government involvement in critical technology, and how that framework can help us understand recent actions and the future actions we may encounter.
The US AI moat is shallower than meets the eye
When ChatGPT first launched in 2022, the reaction from China was tempered. Indeed, the Chinese government’s first moves included a flurry of regulations, restrictions on deep fakes, algorithm disclosure and heavy censorship of chatbots11. This cautious reaction began to loosen in 2023 as US frontier labs began to show rapid capability improvements, and the US began to place significant export restrictions on GPUs. Soon thereafter, Beijing was spurring domestic chip production, subsidizing compute costs for model developers, and building shared data repositories for domestic model training. This was first widely recognized by the media with the “DeepSeek Moment” in early 2025, where China shocked the world with the model’s capabilities and apparent infrastructure efficiencies; nevertheless, the progress had been ramping well before then, and it continues at a rapid pace (Figure 2).
Figure 2: China’s Rapid Narrowing of the U.S.’s AI Lead

China’s initiatives for AI dominance
China’s strategy for AI has been very different from that of the US. China’s policies are centrally coordinated and defined by the following characteristics:
A relentless focus on open models. China’s LLM strategy has been centered on “open-weight models”, which means the core LLM weights are open for anyone to copy. This, coupled with public research from AI researchers and openness with training data, has helped the Chinese AI ecosystem to flourish as companies can leverage each other’s breakthroughs and avoid competitive and duplicative investment12. While the origins of China’s penchant for openness seemed to originate from a desire to close the gap with the US more quickly, its strategic consequence exacerbates price competition and token commoditization.
Sovereign compute and the energy advantage (“The China Stack”). With the 2023 restrictions on advanced AI chip sales to China, the government quickly mobilized its domestic chip producers (notably Huawei) to ramp up production of their own chips for Chinese model developers. In 2025, this initiative received a demand-side push as the government offered discounts as steep as 50% on electric power for companies that switched from US chips to domestic alternatives13. This is a highly strategic move to reduce the cost of critical raw materials for AI models (energy) to compensate for relative weakness in another (GPUs). Indeed, China has a deep advantage over the US when it comes to energy infrastructure. The country has 32 nuclear reactors in development, versus 2 in the US. Furthermore, China has pushed aggressively to expand coal and oil production, and it has led the world in clean energy deployment. The US has existing coal capacity of 174 Gigawatts, and China currently has 1,591 Gigawatts in permitting, construction, and active deployment14. Indeed, between 2010 and 2024, China’s power production increased by more than the entire rest of the world, and China generated twice as much electricity as the US in 202415.
Public funding where private capital falls short. In addition to indirect subsidies for energy, the Chinese government is providing a host of other benefits and subsidies to spur domestic AI innovation. The local governments of Shanghai, Beijing and Shenzhen are offering vouchers to AI startups to assist with compute rental costs for training new models or to license pre-trained base models16. In addition, government-backed venture funds have injected capital into the country’s most promising AI startups like Zhipu, Moonshot, MiniMax and DeepSeek, and the government has boosted the availability of data and compute resources with the National Data Resource Platform and the National Integrated Computing Network17.
Good enough tech, mass deployment. China’s AI Plus initiative was launched in 2024, and its aim is to rapidly integrate AI into domestic industry18. While it’s unlikely China will be able to outpace the U.S.’s semiconductor might, it has focused on leveraging “good enough tech” with subsidized costs and an ambitious initiative for mass deployment. The initiative, published by the State Council, is targeting greater than 70% penetration of AI-powered intelligent terminals and AI agents across key sectors by 202719. The “implementation gap” between early agents and agentic AI is significant in the US as companies must rearchitect workflows and data fabrics to capitalize on agentic AI. China’s policy seems to be targeting this gap as a key advantage versus the US.
China’s approach versus that of the US is emblematic of a superpower race between different governmental and economic systems, with increasingly divergent technical stacks. Interestingly, the national borders have been porous on the Chinese side when it comes to deployment. In fact, a partner at a16x recently noted that US entrepreneurs are increasingly using open models from China20.
US government actions across two administrations
The US government has not been standing idly by as China ramps up its governmental AI initiatives. Its response reflects the private-capital advantages and orientation of the US. Indeed, while the policies have changed markedly with the transition between the Biden and Trump administrations, both seemed to understand the national security implications of losing to China early on. The following key events highlight the evolving government actions on AI:
IP Guardrails and AI Safety (2022-2024). Under the Biden administration, the US introduced the CHIPS and Science Act in 2022, which earmarked $280bn over 10 years to expand domestic semiconductor fabrication and research. The administration also focused early on AI safety with Executive Order 14110. In addition, at the tail end of the administration’s term, the government introduced significant updates to semiconductor export controls, first introduced in 2022. The restrictions covered HBM (high-bandwidth memory), advanced GPUs, and semiconductor manufacturing equipment21. This not only included China, but also many other countries worldwide under a tiered restriction system.
Easing the Safety Focus (early 2025). The Trump administration cancelled EO 14110 on day one. The Commerce Department also began renegotiating previously allocated grants under the CHIPS Act, believing the previous administration’s grants were “overly generous”22. Restrictions were eased for most countries, though those on China persisted.
Full speed ahead and “Build Baby Build” (Mid 2025-Present). While the US government previously allowed China to purchase the less-powerful H20 processor from Nvidia, a December 2025 executive order allowed for sales of the powerful H200 chips with a 25% surcharge, according to Reuters. The Trump administration released its AI Action Plan in July of 2025 with an explicit focus on winning the AI race versus China, and this included substantial easing of regulatory constraints, federally-driven regulations with limits on state regulation, heavy internal government adoption, and fast-track permitting for infrastructure construction23. In addition, the plan introduced explicit diplomatic goals to boost exports and partnerships with allies. In November of 2025, the White House further amplified its AI focus with the launch of “The Genesis Mission”, which included several executive orders focused on leveraging proprietary US government data to accelerate scientific advancement.
“…from this day forward, it’ll be a policy of the United States to do whatever it takes to lead in the world of artificial intelligence.”
—President Donald J. Trump, July 23, 2025
We’re still in the early stages of this race for AI supremacy, but the vectors of government involvement are becoming clearer. This is unlike anything we have seen with recent technological revolutions, such as the dotcom, mobile, or cloud markets, yet many continue to rely on those frameworks to understand AI’s ramp. I view this as a mistake, as governmental involvement can significantly change the calculus around capital, adoption, and input-cost economics.
Figure 3: President Trump at White House AI Summit, July 2025
Historical government-spurred technological and industrial waves
AI investment currently represents a significant share of GDP and an even greater share of 2025 GDP growth. As a percentage of GDP (approx. 1.6% in 2025), however, it is dwarfed by many past cycles, particularly those spurred by government initiatives. I believe looking at two extremes of government-supported technology and industrial waves can provide useful context for analyzing the potential geopolitical factors driving the AI race. On the high end of the spectrum, I look at WWII and its whopping 37.8% of US GDP at peak, with the Willow Run Miracle as the headline-grabbing event of the era. To be clear, WWII spending was the result of an all-out war stance, and I wouldn’t expect AI to approach these levels anytime soon; nevertheless, it serves as a powerful reminder of government’s role in industrial and technological development. At the more conservative end of the spectrum, I touch on the 1960’s Space Race, which directly incubated and launched the integrated circuit industry from its infancy.
WWII and The Willow Run Miracle
“We must become the great arsenal of democracy.”
President Franklin D. Roosevelt, 1940
World War II represented one of the most extensive government-driven industrial and technological scaling in human history, with war-related activities accounting for nearly 38% of US GDP by 1944. One of the most enduring industrial events of the war was centered on the Ford Motor Company’s Willow Run plant. Ford was asked to trade in its auto manufacturing prowess for aeronautics, tasked with building heavy bombers for the nation. In the first year, Ford only produced 56 planes, and the quality was so low that the first planes were used for non-combat operations. But under the watchful eye of the plant architect, Charles Sorensen, the plant ramped quickly. By its peak, it was producing a bomber every 63 minutes, and the plant became a symbol of America’s industrial might24.
Figure 4: Willow Run Bomber Manufacturing Plant
The Space Race launched the modern semiconductor industry
The Space Race found its inspiration with the surprise launch of the Sputnik satellite by the Soviet Union on October 4, 1957, though many chart the formal launch of the race to John F. Kennedy’s famous 1961 speech in front of Congress. While the integrated circuit (IC) had been developed in Silicon Valley in the late 1950s, the government became the critical first mega-customer for the sector, launching the industry from infancy25.
“Now it is time to take longer strides—time for a great new American enterprise—time for this nation to take a clearly leading role in space achievement, which in many ways may hold the key to our future on Earth.”
—President John F. Kennedy, May 25, 1961
NASA and the Air Force needed computers that were lightweight and power-efficient to run planned missiles (Minuteman II) and spacecraft (Apollo), a hefty ask in an era still dominated by room-sized computers and vacuum tubes. Having a guaranteed customer like the US government spurred private financing that was sorely needed for a nascent, capital-intensive industry. By the time government spending on the Space Race began to ease in the late 1960s, the cost of ICs had declined precipitously and was now in the sweet spot for private sector demand. At its peak, the Apollo program accounted for 0.7% of the 1966 GDP.
Figure 5: President Kennedy with Cosmonaut German Titov and Astronaut John Glenn, 1962
The AI race in historical context
The US is currently spending approximately 1.6% of GDP on AI infrastructure, and that’s mainly from the private sector. This is dwarfed by government-stimulated spending on both world wars, the New Deal, and even the national highway system. But it’s close to the telecom spending in the dotcom era and above that of the 1960s Space Race. Historical context suggests AI’s trajectory can vary widely depending on government sponsorship, as shown in Figure 6.
Figure 6: Illustrative AI Spend Share of 2030 GDP Using Past Tech and Industrial Waves26

The TERFF framework
Analyzing the impact of government involvement in technological or industrial development is further complicated when it involves the competition between two superpowers. Whether it’s China versus the U.S. today, the U.S.S.R. versus the U.S. during the Cold War, or the Allies versus the Axis powers during WWII, we must view the international competition through a framework that allows for different economic and political systems. I believe this can be accomplished with the TERFF framework:
Technological Productivity. How quickly can a nation turn its assets and innovation potential into usable levers to drive a more technological or industrial shift? During the Cold War, the US, as a primary technology buyer, was able to stimulate integrated circuit and software dominance while the U.S.S.R. had to reverse engineer U.S. technology with a 5- to 9-year lag.
Elasticity of Mobilization. The inherent effectiveness of the governance and incentive machinery of a nation-state and its allies to reallocate labor, data, capital, and IP at scale. During the Cold War, the Pentagon/DARPA and decentralized capitalism added considerable pressure to the U.S.S.R.’s inflexible and centralized GOSPLAN.
Resource Security. Supply line resilience and availability. Amid the Cold War, the US dominated sea-lane control and had diversified mineral imports, while the U.S.S.R. had persistent quality control issues.
Finance Leverage. Private funding, fiscal investment and monetary policy gave the US a far more resilient financial system than the U.S.S.R.’s quota-driven capital system. In WWII, the Fed pegged the yield curve to 0.375%-2.5%, and the dollar was the Allies’ reserve currency (e.g., Bretton Woods, 1944).
Force multipliers of alliances. Establishing and fortifying alliances with other nations for the common interest of winning the race is a critically powerful accelerant for technological and industrial change. During the Cold War, this was manifest with NATO’s pooled R&D and “five-eyes intel sharing”, versus the far less efficient Warsaw Pact of the Soviet Union. During WWII, it was the lend-lease program and particularly close diplomatic ties between the US and the U.K.
Figure 7: Early TERFF Actions in AI for the US and China
The early signs are that governments around the world are taking a keen interest in national AI progress, with alliances and resource pools centering around the US and China. Even the regulatory-heavy EU has recently begun to postpone and soften the AI Act, which it just launched in 2024, and will now see certain compliance deadlines delayed to 202727.
It’s unclear how far government promotion of AI will progress across the globe, but early signs suggest the pace is quickening. In corporate boardrooms, expect companies to spend time determining how their actions can make them “National Champions” of their home country’s initiatives.
Agentic AI’s Challenge: Turning “Intelligence” into Execution
Although 2025 was a remarkable year for AI’s reasoning capabilities and early agent frameworks, a significant “implementation gap” persists between the raw capabilities of foundation models and their actual impact on corporate P&Ls and consumer computing. This gap exists because raw LLM outputs are probabilistic, while our bank accounts, calendars, and business processes require determinism and safety.
Whether you are a CTO deploying an autonomous coding agent or a consumer trusting an AI to negotiate your landscaping bill, the requirement is the same: the system must be reliable, not just creative. In order to bridge this gap, we need to look beyond model weights and chatbots and focus on system architecture for agentic AI. The road to autonomous agents is not paved by foundation models alone, but by agent-centered systems and architecture.
In this section, we will explore:
The Historical Context: The 70-year battle between “Symbolic AI” (rules-based) and Neural Networks (probabilistic) is potentially evolving into a hybrid approach necessary for reliable agents.
The Anatomy of an Agent: A deep dive into the LOCE Stack (LLM, Orchestration, Context, and Execution), which is a framework for understanding the components required to build reliable agentic systems, whether they are running on a corporate server farm or a smartphone.
The Infrastructure Implications: I also discuss why “agency” is not just a software upgrade, but a potentially massive multiplier for compute, networking, and energy consumption.
A 70-year journey from symbolic systems to neural networks
Before deep learning, the transformer, and LLMs dominated the AI industry, there was a long period of AI booms and winters as researchers and practitioners focused on Symbolic AI and Expert Systems. Symbolic AI emerged from the research of Marvin Minsky and others, which focused the industry on symbolic systems after criticizing early artificial neuron theories. This era appeared to dominate the industry from the 1960s and throughout the 1990s. The focus of symbolic AI was to teach computers to understand explicit human-generated rules to imitate human cognitive processes. While this method enjoyed a mini-economic boom in the 1980s28, these methods eventually faltered because explicit rule-based frameworks were often brittle and rarely captured real-world edge cases.
Neural networks began to recapture the attention of researchers in the late 1980s into the early 1990s after breakthroughs such as Yann LeCun’s and Bell Labs’ work on recognizing human handwriting using backpropagation and convolutional neural networks29. These early efforts had limited commercial impact because compute power was not yet efficient and digital data was not yet plentiful. The industry needed a few more decades of Moore’s Law and the digital data explosion enabled by the internet to reach the modern era. The shocking leap in image recognition performance from AlexNet in 2012, and Google’s invention of the transformer in 201730 were among the primary accelerants.
As can be common with major revolutions, there is now an increasing amount of discussion about how the merger of these two fields with “Neuro-Symbolic” approaches could be the key to unlocking AGI and ASI31. Indeed, Google DeepMind recently noted that its groundbreaking AlphaGeometry system is taking this approach: “AlphaGeometry is a neuro-symbolic system made up of a neural language model and a symbolic deduction engine, which work together to find proofs for complex geometry theorems. Akin to the idea of ‘thinking, fast and slow’ [Daniel Kahneman], one system provides fast, ‘intuitive’ ideas, and the other, more deliberate, rational decision-making”32. Indeed, this is a dominant theme throughout DeepMind’s AlphaX programs.
Table 2: Is Neuro-Symbolic the Destination?
Interestingly, this hybrid approach will potentially be the dominant theme as AI enters the agentic era, and a recent paper dives deep into this concept for agents: Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions33. Indeed, a recent paper by Stanford professor Edward Y. Chang added mathematical rigor to explain the need for adding System 2 level thinking to the System 1 probabilistic and fast thinking of an LLM to achieve specific real-world goals. He even argues that this may allow LLMs to take us to AGI34.
Digging into the Components of Agentic Systems
As AI solutions evolve from chatbots and isolated copilots into autonomous agents capable of complex, multi-step workflows, it’s helpful to understand the stack that enables this. I believe a useful mental model for this is the “LOCE” stack, which stands for LLM (or any foundation model architecture), Orchestration, Context, and Execution.
Figure 8: The LOCE Stack
These four layers, coupled with core infrastructure and governance, describe the core anatomy of an agentic AI system. Understanding these layers provides grounding for any analysis of agentic AI and helps to monitor the closing of the critically important implementation gap.
The agent’s reasoning engine: the foundation model (LLM)
LLM’s that can “reason”, by iterating through an answer, were the key steppingstones to agentic AI systems. For a standard chatbot, the LLM’s job is to have a conversation. In an agentic system, the LLM’s goal is task completion. The LLM’s role is to understand the intent of the task, and it acts as a creative decider. Think of it as a brilliant chef in a restaurant. A decent cook doesn’t deviate from a recipe, but a great chef can improvise and add some randomness to create magic.
“LLMs are not sufficient for AGI, but they are a necessary substrate. The practical question is not whether to discard pattern models, but how to organize them into reliable, constraint-following, long-horizon reasoning.”
—Professor Edward Y. Chang, Stanford University, 2025
Unlike traditional deterministic software, an LLM is probabilistic. And while this probabilistic nature can sometimes be a liability, it can also be very powerful when the proper guardrails are in place. Unlike traditional software, LLMs are very good at handling ambiguity and can keep pushing forward even if the task is not clearly defined or the goal is dynamic. LLMs also have a remarkable ability to understand natural language and route natural language instructions to specific technical functions without hardcoding. But there are LLM weaknesses that add risks to agentic systems.
Returning to our kitchen analogy, this LLM chef has chronic amnesia and sometimes goes off the rails. The LLM has no persistent memory across sessions (modern chatbots can add memory as part of the application layer), aside from knowledge of the data it consumed during training. In addition, it doesn’t have the ability to take action or complete tasks. Modern chatbots have added action functionality through tool calling, agentic applications like deep research, and even application SDKs. Still, none of this is inherent to the LLM itself and can’t help with broad enterprise or consumer tasks. This is where the other layers of the stack come into play.
The agentic control tower: the orchestration layer
The orchestration layer is the key antidote to the problems that come with probabilistic foundation models when we ask them to take real-world actions. It injects the rule-based guardrails of deterministic software and critical grounding context into the agent’s task flow. Using our kitchen analogy, the orchestrator is the kitchen manager or restaurant owner; she makes sure our creative chef doesn’t violate health codes or price dishes at a loss.
As a reminder, deterministic software is like an Excel spreadsheet doing math: the answer to 5+5 is always 10. In contrast, LLMs are probabilistic, so without a reasoning chain, early chatbots may have given you a different answer to 5+5 each time you asked. We’re beyond this state now with reasoning models and advanced post-training (for simple math queries), but the fundamental differences between probabilistic and deterministic remain; and it is arguably amplified by “the jagged frontier” concept discussed later in this article.
An AI critic could argue this is just wrapping LLMs in old-school software, thereby admitting to fundamental weaknesses of AI. An AI super bull could argue that AGI or ASI won’t need deterministic guardrails; it will just figure it out, as any superhuman should. The reality is that with the current and medium-term capabilities of AI models, orchestration is a critical component that enables businesses and consumers to leverage the power of AI in the real world.
Indeed, if we look back at the neuro-symbolic discussion from earlier in this section, injecting deterministic guardrails into an agentic process is very similar to the argument that LLMs will require symbolic (rules-based) grounding to reach AGI.
A deterministic shell and a probabilistic core
The entire orchestration layer is not old-school, deterministic software. Indeed, in many instances, LLMs are used for task planning and even for the verification of results. The orchestrator injects determinism and context into the flow, acting as a control tower for agent activity. It can assign tasks to sub-agents, ensure that role-based access controls are followed, and monitor workflow patterns. It can also manage costs by stopping a rogue, probabilistic LLM from running a loop for hours, or by switching to a cheap, local model when the task allows.
Figure 9: The Agentic Control Tower, Orchestration
The agent’s memory: the context layer
The most essential ingredient for successful agent actions is context. For instance, an agent cannot process an invoice if it doesn’t understand the relationships between “invoices”, “purchase orders,” and the “vendor.” The context layer adds critical data that counters the LLM’s probabilistic shortcomings. It serves as short- and long-term memory that the LLM needs to reason through an agentic task correctly. In other words, it adds a dose of symbolic stability as guardrails for the neural LLM. In our kitchen analogy, these are the exact ingredients the chef needs for the meal.
When designing agentic systems, much care is taken to provide just the right data at the right time. Context management is the agentic cousin of prompt engineering. Too much data can overload the system and run past the LLM’s context window. Too little data and the LLM will be confused. A few examples of context sources can help provide a real-world perspective:
Structured knowledge graphs. A business or a user’s fact-based data source to help the agent understand the ideal task plan in a business setting or personal preferences in a consumer setting.
Vector memory (RAG or agentic RAG). Provides semantic meaning to datasets so the agent can pull policy documents, past task failures or successes, and business or user goals.
The working memory scratch pad. Ephemeral context, notes on immediate action history, and feedback from sensors or tools.
Tools for dynamic context. Real-time data pulled via APIs or MCP connections. The agent can adjust its plan or see the outcome of a recent step via these tool connections.
Context can be injected into the LLM (by the orchestrator) at the beginning of a task or loop to establish an initial state, then it can be fed into the LLM as the task is underway via dynamic context loading.
In the early days of the chatbot boom, there was a common refrain that data was the new oil, and it remains true for model training. In the agentic sense, refined data is the fuel for task completion, and it is the agent’s context. As agentic AI continues its ramp, capturing and leveraging high-quality context becomes a critical asset for businesses. And for consumers, the vendor that understands and protects personal context could enjoy powerful consumer switching costs.
Figure 10: Adding Determinism to Stay on Goal

The agent’s toolkit: the execution layer
The execution layer represents the ecosystem of tools and services available to the agent to complete the task. In the kitchen analogy, this allows the chef to utilize fancy knives and cooking utensils. These tools can help trigger a real-world action (e.g., “book an Uber”). Modern agents may leverage emerging standards like MCP (model context protocol) to call on and communicate with tools and services, or they may leverage proprietary layers like Apple’s App Intents framework to add a layer of security to tool calls and MCP usage.
If an agent is calling upon a service, this could be a traditional app, but since the agent is handling the interaction with the application, the legacy UI is less necessary. This is why it’s common to hear folks discussing the disruption of apps, but the apps don’t necessarily go away. They will more likely shed their UI for agents, becoming headless apps or services.
This layer is a critical area for safety and governance protocols. Read-only tool use, like checking stock prices or weather, is generally in the green zone. But when you use tools to impact the real world, like sending money or deleting files, strong governance is necessary. The orchestration layer usually manages the execution guardrails in this respect.
Agent governance and security
The last layer of defense from probabilistic chaos is the agentic governance and security policies. Using our kitchen example, these are the restaurant policies, security cameras, and locks that prevent the kitchen from burning down or serving rotten food. Agent governance and security require active and persistent monitoring, but it can’t be too restrictive, or it will weaken the advantages of the LLM’s capabilities. The following key points help frame these policies:
Identity – agents need permissions to access the appropriate tools and services for a task, but the agent should have limits to what it can access, just like humans do. This is also a critical component of security, as third-party agents need permissions to “get inside” and first-party agents need guardrails on what they are allowed to do.
Sandboxing and kill switches. For particularly sensitive tasks or when testing a new agentic system, sandboxing will enable engineers to test and constrain the agent to prevent any harmful or non-compliant actions.
Observability. It’s often difficult to see why an agent did what it did or predict its next action, and innovation on observability systems will be critical as a result. This also includes evals and verification, which can be essential end-of-loop checks on the agent’s work before final task completion.
AI infrastructure: software and hardware
Perhaps the most important takeaway for business and finance executives monitoring AI is that both hardware and infrastructure needs can increase dramatically. Moving from chatbot inference to agentic inference can increase token consumption by 50-100x, depending on the workload. And the additional software infrastructure needs of the agentic stack only add to infrastructure requirements.
The raw materials for tokens are compute, power, and data. The compute component includes AI-optimized, “accelerated computing” semiconductors (GPUs, custom ASICs), increasingly dense and high-bandwidth memory, and cutting-edge networking; all of this is wrapped in racks generally supplied by major server vendors. On the data side, increasingly performant flash-based storage will be needed as once cold data becomes critical and hot. Meanwhile, energy needs are also compounded with agentic, and this is quickly becoming one of the key bottlenecks for the aggressive datacenter plans of leading hyperscalers.
The build-out of the agentic stack, however, also introduces new technology components beyond LLM infrastructure. In particular, infrastructure software begins to take on a different but increasingly important role. Data fabrics, vector databases, data streaming systems, observability, security software, etc., will all be critical for driving high-functioning agentic workflows. Incumbents will provide some of this, while other areas leave room for AI-native startups.
The Jagged Frontier and the Hidden Shape of AI Progress
One of the challenges of analyzing the AI industry is that you hear markedly different personal anecdotes on how it is impacting individual lives today. You may listen to a mathematician note with astonishment that AI is drastically impacting their research process35, while your neighbor may tell you about how LLMs can’t tell you how many times the letter R appears in “strawberry”.
This isn’t just about the level of user sophistication, it’s also an empirical fact, named the “Jagged Technological Frontier” in a Harvard Business School Paper36. AI is good at some things and awful at others, and with each new model release, it moves beyond these achievements to the next level of jagged capabilities (Figure 11).
Figure 11: Jagged Frontier of Equally Difficult Tasks
This can lead to the curious observation that every frontier model release features powerful anecdotes from AI optimists and from AI pessimists. But because the frontier is generally moving up and to the right with model capability improvements, the pessimist anecdotes can become stale quite quickly. Nonetheless, for AI pundits unfamiliar with the Jagged Frontier, this can be a fatal poison for unbiased analysis. This is because, quite simply, the current jagged frontier is almost always far worse than tomorrow’s jagged frontier. Indeed, if we imagine future states like AGI or ASI, the frontier becomes markedly less jagged.
Figure 12: Tomas Pueyo’s Interpretation of the Jagged Frontier and AGI37

What’s the root cause of the jagged frontier? The probabilistic nature of LLMs and their tendency to “hallucinate” on certain tasks are likely the most important drivers, and for agentic use cases, we could imagine that insufficient orchestration, task-length coherence, and context gaps may also contribute to jaggedness. The ability of agents to maintain coherence over long periods of time (task length) can also be an essential contributor.
The authors of the original Jagged Frontier report also noted that workers using AI for tasks within the frontier usually showed a measurable productivity improvement. Meanwhile, workers who leverage AI for tasks outside of the frontier often experience a decline in productivity. The chief concern is that workers may not know which tasks fall on the frontier or not.
Understanding where AI can be a labor replacement, an augmentation tool, or where it shouldn’t be used at all is a critical exercise for businesses trying to cross the implementation gap for agentic AI. It’s also an essential component of product strategy for frontier model labs and agentic AI vendors. Most importantly, the jagged frontier is always changing (i.e., hopefully moving forward) so this is a perpetual exercise for AI industry participants.
Thinking through the agentic jagged frontier
Leveraging agents for tasks previously completed by humans clearly requires a deep understanding of where those tasks land on the current jagged frontier. Furthermore, leveraging agent components beyond the LLM’s capabilities to reduce its unpredictability is invaluable. Indeed, we believe the Jagged Frontier is a critical component of “the implementation gap,” and technical and operational solutions for identifying and correcting tasks outside of the frontier will be a significant factor in the adoption rate of AI in general. We believe the following factors can help think through this challenge:
Verifiability determines autonomy. Andrej Karpathy tends to view AI as “software 2.0” distinct from earlier computing eras defined by “software 1.0”. He describes the Software 1.0 era as technology that easily automates what you can specify (a task that can be done via an easily specified algorithm), and Software 2.0 easily automates what you can verify38. If a task is verifiable, then it can be mastered through reinforcement learning (usually in the post-training stage), and this likely describes many of the tasks that fall within the jagged frontier of a particular time period. This is why we tend to see surprising breakthroughs in highly verifiable categories like science, math, and coding, while we can be disappointed with tasks that we define as “common sense.” From an agentic perspective, this also applies to tasks that can be verified by an LLM (different than the core agent’s LLM) or a human-in-the-loop. For instance, if a sell-side research analyst uses an agent to write an earnings recap for a company, the current state of LLMs would almost certainly require a human fact check before publication; still, editing is potentially easier than creating, so the analyst’s overall productivity should rise.
Orchestrators that know when humans are needed. Ideally, part of the orchestration process for AI agents would include deterministic rubrics for when a task requires human verification and when it should not be managed entirely by an LLM-driven process. This is not an easy task, and it’s a perfect example of why deep institutional knowledge and forward-deployed engineers will be increasingly important for real-world AI use cases in 2026 and beyond.
Proper context could bring tasks into the frontier. For a task outside the training or RL (reinforcement learning) components of the LLM, task-specific context (i.e., methodologies and solutions to similar tasks) can be injected during the task loop. Similarly, proper context can help an LLM act as a verifier for off-frontier tasks.
Tools to minimize off-frontier risk. The execution layer of the agentic stack can also be a helpful defense against jagged frontier risk. For instance, if the task is off the frontier because it requires proprietary information within the enterprise, the orchestrator can call on a tool like a RAG pipeline to retrieve relevant information from a vector database.
The Jagged Frontier cannot be fully resolved with current technology, but the deterministic elements of the agentic stack (vs. the probabilistic LLMs) can greatly mitigate the unpredictability and performance shortfalls caused by off-frontier surprises. I’d expect the deterministic aspects of agentic to advance significantly in the coming years and directly narrow the implementation gap. The timing of this, however, will vary across end markets and is likely to drive many bull-versus-bear debates on AI.
Figure 13: The Jagged Frontier of AI Can Be Jarring
Substitution, productivity enhancement and trust
If a task is entirely within the Jagged Frontier, it is likely this task can be completed autonomously by an agent, holding other factors constant. If a task is off the frontier, but tools and human verification can derisk it, then this is a task where the human can be augmented. Productivity typically increases whether an agent completes a task autonomously or as a human copilot. This adds some complexity to analyses where one is trying to determine which types of jobs can be done by agents, and which cannot. In a recently published McKinsey study39, the authors attempted to address this by splitting tasks into “automatable” and “non-automatable”, and further by measuring hours, not jobs, that agents can complete. Unfortunately, this bifurcation between automatable and non-automatable is wobbly since the Jagged Frontier of capabilities is always shifting. For instance, if you surveyed experts in early 2023 about whether creative graphic designers or PowerPoint creators would face more AI disruption, there is a good chance many would pick the latter; Google’s Nano Banana tool begs to differ.
Rearchitecting workflows to maximize productivity. When the frontier is not yet sufficient for a reasonable degree of autonomy, workflows can be adjusted to maximize human and agent productivity. This will likely be a very popular role for consultants and forward-deployed engineers as agentic AI ramps. In addition, the humans that can best leverage agents as copilots may find themselves with much higher-value jobs; from an enterprise perspective, this may mean one AI connoisseur can do the job of 5 pre-agentic predecessors.
The measurement gap and Jevons Paradox. If an agent acts as an effective copilot in an off-frontier task, the benefit may not be measurable in “hours saved”. It may result in higher quality work or higher employee morale. This will likely be not easy to measure as agentic AI ramps up. In addition, if the marginal cost of an agentic task within the frontier declines fast enough, Jevons Paradox may take over and cause demand for that task to rise. So, a task that was too time-consuming or expensive to do in the past may become commonplace with AI, and this shadow TAM will be difficult to measure or predict in advance.
The jagged trust gap. The difficulty in predicting which tasks fall on or off the frontier for current model capabilities has a direct impact on the implementation gap. If business leaders can’t trust the outcome of agent implementation, then adoption could slow. This suggests that lower-risk, highly verifiable tasks will dominate early agentic adoption, and higher-risk, harder-to-verify tasks will be adopted later.
The implementation gap and the J-curve
The “implementation gap” is the time between early agent proofs-of-concept and broader multi-agent systems capable of acting as digital workers. This is perhaps the most crucial concept for any economics- or finance-centered executive to understand and measure, as it determines the timing of accelerated economic impact from AI. This concept is also pervasive in economic literature, and we believe it is closely tied to recent work on the J-curve productivity paradox for general-purpose technologies40. Most of us can remember this gap for 21st-century tech cycles. The gap from the dot-com boom to mobile and Web 2.0 monetization and proliferation, and the gap between early cloud systems for startups and enterprise cloud adoption. Indeed, far longer gaps occurred with steam and industrial automation, railroads, and electricity.
In their 2021 paper, The Productivity J-Curve: How Intangibles Complement General Purpose Technologies, Brynjolfsson, Rock and Syverson note, “realizing their [general purpose technologies’] potential also requires large intangible investments and a fundamental rethinking of the organization of production itself”.
This argument translates well to AI adoption in the enterprise, as companies are realizing that the ideal implementation of AI, and agents in particular, requires a “rethinking” of workflows, human roles, data capture, governance, etc. All of this takes time and intangible as well as tangible investments, and amidst the transition, productivity or ROI can be challenging to measure. In addition, the jagged frontier makes it difficult to estimate near-term AI impact and where gaps need to be filled.
With that said, I believe it’s likely that the implementation gap for agentic AI could be shorter than prior tech and industrial cycles for several reasons:
SaaS and Cloud Plumbing Set the Stage. Defined SaaS workflows, connections to systems of record, and API connections to external tools and services all act as a battle-tested substrate for building out agent orchestration, context capture and tool connectivity. This is also why leading SaaS, cloud, and systems-of-record companies may have an early advantage in delivering agentic architecture solutions to enterprise customers.
Geopolitical “races” can shorten the gap. While many general-purpose technologies faced a gap before ultimate productivity, those that solved a real government need or priority often faced a much shorter gap. For example, the aviation industry was largely kicked off by government demand for air mail and then WWI. Similarly, the microelectronics industry (the integrated circuit specifically) grew as early demand was almost entirely from the US government. This is yet another reason AI’s importance to nation-states should be accounted for in any analysis.
Early successful implementations may trigger ROI FOMO. The implementation gap for agentic AI will impact different companies and industries unevenly. Companies and sectors that have invested in deep adoption of cloud and SaaS technologies may find it easier to implement agentic workflows than a company still running on mainframes. In addition, companies or sectors with more “verifiable” tasks per worker may see faster adoption as well. The key is that companies decoupling revenue growth from headcount will likely not be quiet about it, and one can imagine this will have a stimulative effect on the actions of some of the laggards – particularly within public markets.
Does this mean the gap will begin to close for many sectors in 2026? It’s certainly possible. Though the history and economics purists would counter that AI has been in development since the 1950s, so a fast ROI takeoff in 2026 would just be a normal, humdrum implementation gap!
The agent pricing matrix
One of the most popular debates in 2025 was how vendors should price agent offerings. This is by no means settled, mainly because we are so early in the adoption of agents. Nevertheless, the Jagged Frontier and J-curve concepts help us dissect the argument to understand the risks and benefits of each approach, from both the customer and the vendor’s perspectives.
Table 3: The Agent Pricing Matrix
Perhaps the most difficult part of the agent pricing puzzle is the fact that the Jagged Frontier is always shifting with new model launches. Many margin-maximizing companies choose the consumption option because AI is still in the peak era of uncertainty; unfortunately, given the fact that agents can consume tokens at a fast and unpredictable clip, this potentially compresses customer usage even with declining token costs.
Tokens, Labor and the Agentic Flywheel
By analyzing the AI industry as a general-purpose technology with an ultimate utility of labor augmentation and substitution, I believe two key components shape the framework for understanding how this industry could unfold:
Token price declines are agentic fuel. When considering the impact of general-purpose technologies, the cost of inputs is a critical component of eventual application economics (Bresnahan & Trajtenberg). For end-user AI, the primary measure of input costs is the cost per token to use foundation models. Token costs have been declining precipitously. This is a very encouraging sign, as it broadens the applicability of AI applications, and particularly, sets the stage for the potentially rapid ramp of agentic AI.
The AI TAM is a function of the global labor market. Discussions around the TAM for AI have ranged from “internet search” in 2023 to “SaaS” and “all application software” in 2025. While these are all impressive TAMs, the mistake is viewing AI as a potential new digital tool for humans like prior technology cycles. It can be a new tool, but the ultimate destination is likely agentic AI, which, in its ideal form, doesn’t just augment work; it performs it. It’s about converting capital (compute and energy) into digital labor. A bull would see this as a steady ramp towards full autonomy and mass productivity enhancement, and a bear would argue limits of technology won’t allow us to surpass the “agent as copilot” era. In either case, both are arguing about how much of the same $50 trillion labor TAM AI can impact41. As we will discuss, labor as the TAM doesn’t necessitate mass job elimination, nor does it imply that AI will have no impact on existing technology TAMs. Also, this is not an argument that startups should start putting $50 trillion in the TAM page of their decks; we dig into TAM calculation dynamics later in this section.
By leveraging the labor lens, we can begin to explore the economic implications of AI’s path to agentic.
Token price declines fuel AI’s progress
Token prices have declined significantly since the launch of ChatGPT in 2022. The primary LLM revenue mechanism to date has been from paid SaaS subscriptions to chatbots and paid API access to tokens. The cost of generating these tokens largely defines the monetization economics for both. With this in mind, it’s notable that model makers seem to celebrate token price declines with nearly every launch.
“The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. You can see this in the token cost from GPT-4 in early 2023 to GPT-4o in mid-2024, where the price per token dropped about 150x in that time period. Moore’s Law changed the world at 2x every 18 months; this is unbelievably stronger.”
—Sam Altman, “Three Observations,” February 9, 2025
Focusing on inference costs across frontier models, Epoch AI notes a median price decline of 50x per year, and some analysis has shown the pace beyond 2024 accelerated to 200x per year42. OpenAI’s recent GPT-5.2 launch scored a 90.5% accuracy on the ARC-AGI-1 benchmark at $11.64 per task, and that compares to an 88% score a year prior at $4,500 per task – a 99.7% decline in one year43. Legacy and lower-performance models are declining at an even faster pace. Of course, the marginal cost of using open models on-device or on-prem can approach zero, holding other factors constant. The model vendors not only seem happy about declining token costs, but also seem to be competing on both capability and cost efficiency. These factors could point to a potentially commoditizing market, so why aren’t the model makers fighting it?
We believe the model makers and any serious player in AI are cheering token price declines and the apparent commoditization of “intelligence” because they are correctly focused on monetizing the application space. Just as Microsoft leveraged its operating system platform to monetize several key productivity apps (e.g., Word, Excel, PowerPoint, etc.), the frontier labs are attempting to capture some of the largest agentic application markets as well. First, with deep research, which is an early agentic system that is thinly disguised as a chatbot feature, and coding agents. More recently, this has been expanding to shopping agents, and down the road, it appears OpenAI also plans to monetize its models with consumer hardware (running an agentic operating environment). Agentic is the monetization engine, and labor is the key construct for the TAM, and tokens encapsulate the key input costs that fuel this market (Figure 14).
Figure 14: Commoditize the Inputs, Tokens Fuel Agentic AI
The raw materials for tokens are energy, compute (i.e., the IT infra stack), and data. The more infrastructure and algorithmic efficiency that can be squeezed out of these raw materials, the better. Most importantly, infrastructure efficiency in raw materials costs doesn’t need to imply the commoditization of this layer’s components (just ask 1990s Intel or 19th-century John Rockefeller); like Moore’s Law, it’s about improving performance per dollar. While Moore’s Law began to slow long ago, some industry rhetoric claims we are experiencing “Huang’s Law” with AI-relevant GPU performance per dollar potentially doubling every six months44. With the entire industry incentivized to increase model capability and decrease token costs, the engine for AI’s progress has the potential to enjoy increasing returns economics from a virtuous cycle (Figure15). The costs of agentic architecture overhead are equally critical, as we discuss below; nevertheless, token costs provide a powerful leading indicator of the economic rationale for AI adoption in both the consumer and enterprise markets.
Figure 15: The Capital to Agentic Labor Flywheel
As AI evolves from a tool to a worker, so changes the TAM function
Traditional IT innovations can be general-purpose technologies (i.e., the microprocessor, PC, and the internet), but my contention is that most historical IT innovations were definitively utilized as tools for humans, with limited long-term broad labor substitution. From the start, modern generative AI has been designed as a steady march towards autonomous agents capable of performing human tasks. The target path is rudimentary tasks to human-level innovation, and finally fully digitized organizations of agents (from previously cited Metz, R. 2024, July 11 Bloomberg article). It’s perfectly reasonable for a bear to say this will never happen and for a bull to say it’s inevitable. Still, both are effectively arguing in the context of the same goal: the killer app is autonomous cognitive labor. Furthermore, the cousin of agentic is AI-driven robotics, which is fundamentally targeting the killer app of autonomous physical labor.
But does this mean that the global wage pool of $50 trillion is the TAM for AI? Of course not, but the cost of labor needs to be a core component of the TAM calculus for any agentic vertical. For example, let’s consider an AI services TAM essay by Lightspeed45. They note, “Genesys helps run contact centers ($40Bn TAM) but an AI support bot can resolve queries (>$300Bn TAM), Excel helps analysts create models, but an AI financial analyst could just do it, and so on.” The ultimate TAM for many agentic verticals is likely much larger than the related software TAM, and over time it can approach the labor TAM.
This is an excellent rule of thumb, but in the early days of agentic AI, it’s becoming increasingly apparent that some job tasks are more easily automated than others. In addition, as agent vendors seek to spur adoption and help customers get over the initial costs of the agent implementation gap, agent pricing will often be a fraction of the equivalent human labor cost per task.
In specific verticals, this could create a market where the agentic TAM ultimately settles only slightly above the software TAM and well below the labor TAM. In other markets, Jevons Paradox could take over as high demand elasticity for the task leads to an explosion in the agentic TAM that eventually approaches the labor TAM, and in some markets, entirely new TAMs could form due to the rapidly declining marginal cost of agents, and the agentic TAM could ultimately exceed any prior labor TAM. I dive into each of these scenarios and the first principles that identify each below.
Agent economics and the three-lens filter
The early academic work on agentic AI and its expected impact on global labor remains nascent, with wildly divergent viewpoints. Nevertheless, for a thematic understanding of the topic, we can narrow it down to some relatively simple math.
As previously discussed, the token costs represent a key input cost for agentic AI, but there is also a cost for the architectural framework to support agentic tasks. This is represented by the orchestration overhead, which includes all orchestration tasks and context management. Furthermore, agents typically need tools to complete tasks, so the cost of these tools (software and services) must also be considered, much as you would for humans using SaaS or services via APIs.
The most important variable to consider is the probability of success. If an agent can successfully complete a task only 20% of the time, the CPSO will likely exceed the cost of a human performing the same task. As such, the agent will fail to deliver the required ROI, and it will be abandoned. The key is that the probability of success can improve as foundation models and agentic architectures advance and gain new capabilities. As a result, an agent without a reasonable ROI today may be one frontier model release away from having astoundingly good ROI tomorrow. The market is highly dynamic, and it’s still in its early days.
We believe this can be narrowed down into a three-lens filter for determining if agentic AI is viable for a specific vertical today and how much a human will need to remain in the loop for given tasks (i.e., the level of agentic autonomy):
Capability and reliability. Is the task in question within the capability set of current foundation models and agentic architectures? The jagged frontier of foundation model capabilities requires careful testing to determine whether a specific task is within current capabilities or will need to wait for future improvements in frontier models. As it stands today, the most successful candidate tasks for agentic AI should be verifiable tasks (e.g., coding, customer support, or semi-autonomous driving). Furthermore, is the typical task length within the task length limitations of current models or harnesses, or will the model start off strong and descend into hallucinations before a loop is complete? Some model shortfalls can be corrected with sound orchestration software and context management, so the initial scaffolding work for agents is a big part of the capability and reliability lens as well. Indeed, many of the pitfalls of agent implementations in 2025 appeared to skip the architectural step, and this is a lesson that will likely lead to very different implementation strategies throughout 2026.
Governance and compliance. Regulatory, moral or societal factors can deeply influence the threshold for “good enough” capability and reliability, and this threshold is often far higher for agents than humans. For example, self-driving cars like Waymo usually have better safety records than humans, but their rollout faced many non-technical hurdles. Or consider healthcare, where an agentic solution may be able to read a radiology report better than most humans. Still, regulatory and moral concerns may keep doctors in the diagnosis loop for quite some time.
Economic viability. Does the agent’s CPSO exceed the cost of human labor for a given task, and are certain tasks just too cheap to justify automation? In specific verticals, this will undoubtedly be the case. The opposite can also be true. Consider AlphaFold (Google DeepMind’s protein-folding AI system), which has capabilities far beyond what was possible before its invention. Indeed, DeepMind has now filled a database with over 200 million predicted structures, which essentially represent all proteins known by the scientific community46. This is used by 3.5 million researchers worldwide, and AlphaFold 3 was introduced in 2024 to extend this capability to DNA, RNA and drugs. There is no feasible economic argument that would have justified letting humans do the work of AlphaFold, as the CPSO is effectively infinitely superior to any human alternative.
In addition to these factors, there is often substantial work required to ensure an organization or task is structurally ready for an agentic AI ramp. This can be an economic sunk cost of bringing in the proper software for orchestration, data fabrics for context management, and telemetry for agents. And it can also include the time-consuming process of modifying workflows to best leverage agentic AI capabilities. This is the implementation gap, and each task and vertical faces this adoption hurdle before fully ramping up agentic AI.
Dynamic SAMs and ultimate TAMs
In most cases, the cost of an agent settles below the equivalent human labor cost, producing an efficiency gain. The size of this efficiency gain will likely be determined by the level of AI competition in that particular vertical or for that task; for example, in a vertical where an agent vendor owns key vertical-specific context or telemetry, the efficiency gain for the customer may be smaller, and the agent vendor’s margin will be higher.
The demand elasticity for a specific vertical also significantly affects the ultimate TAM. If the task is not one for which an enterprise can gain profits by having a much larger workforce for that task, then it may be an inelastic task. In this case, the serviceable addressable market (SAM) may remain close to the software TAM and persistently shy of the labor TAM (Figure 16).
Figure 16: Agentic task with low demand elasticity
In other tasks, Jevons Paradox may take hold (i.e., technological efficiencies make a resource more attractive to use, and these efficiencies drive more usage of the resource); for instance, if an equity research analyst at a bank becomes 10x more productive from utilizing deep research agents, then the analyst may be able to cover a larger number of stocks. This is a highly elastic task, so the lower cost of agents relative to labor significantly expands the market. Because I assume agent costs per task should generally be lower than human labor costs per task, this allows the agentic SAM to approach but not exceed the labor TAM (Figure 17).
Figure 17: Agentic task with high demand elasticity
Taken to its extreme, Jevons Paradox can unlock completely new TAMs as well: only the wealthiest humans can afford a personal assistant, but if a consumer technology company launches a sufficiently capable agentic AI assistant, then broad consumer demand may create a “personal assistant” TAM that far exceeds the size of the original, human assistant labor market (Figure 18). This type of “Jevons TAM Unlock” is an extremely attractive target for agentic AI vendors, and these hidden gems can produce entirely new markets and companies.
Figure 18: Jevons TAM Unlock
Notice that each of the above charts shows an agentic AI SAM starting well below even the legacy software & tool TAM. This is what we expect to find for most verticals in the early stages of agent adoption, and this coincides with the challenging implementation gap most enterprises will face as they prepare for agents. It also leads to much of the confusion triggered by papers in 2025 that bemoaned a lack of ROI for agents. The breakaway for verticals amid this adoption uncertainty will likely come when early adopters begin to show a significant decoupling between revenue growth and operating expenses, and the resulting economic FOMO may nudge more conservative customers into the implementation process. Of course, this all depends on the pace of improvement in AI model capabilities and agentic architecture solutions.
The aggregate agentic AI TAM is murky
The preceding framework is helpful in determining the TAM for a given vertical, but I’d also like to have a method for estimating the entire agentic AI TAM. Indeed, this would be an immensely valuable estimate for the debates over AI capex and future token demand for frontier labs. At this early stage, however, this is a difficult task and is certainly beyond the scope of this article. But I can add some thoughts on how such a calculation could be constructed.
“We find that currently demonstrated technologies could, in theory, automate activities accounting for about 57% of US work hours today.”
—McKinsey Global Institute, November 2025
Perhaps the best reference for such a framework comes from the recent McKinsey report that estimates that 57% of US work hours could be automated by today’s agentic technology (previously cited Yee, Madgavkar, & et al., 2025). The report captures both physical and nonphysical work, with robots impacting the former and digital agents impacting the latter. Importantly, the study doesn’t claim that 57% of “jobs” will be eliminated by agents and robots, since many of these automatable tasks are subsets of a human jobs and automation would free humans to be more productive in the non-automated tasks. For this reason, their midpoint estimate for economic value from AI automation by 2030 is $2.9 trillion.
Although economic value doesn’t directly tie to TAM for various reasons, I believe it’s reasonable to assume the ultimate TAM could be a multiple of $2.9 trillion if foundation models and agentic architectures continue to improve at a rapid clip. In particular, we believe this figure could also be a fraction of the potential shadow TAMs from Jevons TAM unlocking in coming years.
Sources of Competitive Advantage in AI
We’re still in the nascent stages of the generative AI industry’s development, and it’s not surprising that the frameworks for understanding the strategic and competitive dynamics of the space are still evolving. In the early days of the generative AI boom, there was a belief that LLMs themselves could form significant moats and network effects; while things can change over time, current trends suggest the LLM layer has a less sturdy moat, and network effects are not yet obvious. Consider the following:
The human feedback flywheel has yet to materialize. An early assumption was that LLMs with the most users could get smarter than LLMs with fewer users. The idea was that users asking many questions and rating answers would provide critical RLHF (reinforcement learning from human feedback) data. But the reality is that much of the high-value RLHF data comes from hired experts supplied by companies such as Mercor, Handshake or Surge AI during post-training. The marginal value of a user giving a chatbot answer a thumbs up or thumbs down appears to be more limited (though it can still be helpful for style and preference matching).
Standards like MCP are muting platform effects. Another theory for LLM-centric moats recalls the application moats for operating systems: as LLMs gather users, they could attract more application connectivity than the competition. The industry, however, is quickly coalescing around standards for tool and service connectivity, with MCP (model context protocol) from Anthropic emerging as a likely front-runner. OpenAI recently released an apps SDK, which, like App Intents from Apple, is an intermediary for legacy apps evolving into headless agentic services and tools; whether this holds as a proprietary layer despite MCP standards is still an open debate.
As a result of the above factors, the application space (agents) seems to have a much clearer path to defensible moats and network effects. This is yet another reason to believe the frontier labs will seek to capture key sources of value in agentic AI as they seek to generate ROI for their capex-intensive models.
As is typical of major technology waves, incumbents tend to have inherent advantages, but execution and messy tradeoffs often open the door to fast-moving startups. It’s a tension between inertia and disruption that can breed creative destruction, tearing down old moats for the new.
The bundling and distribution head start for incumbents
When frontier labs launch new models, there is a natural focus on how third-party application and API customers will leverage the new model’s capabilities. But how do the model labs use the technology to improve their existing core businesses?
When Google DeepMind launched Gemini 3, it also quickly leveraged it for AI Mode in search, injecting the latest model into its primary business franchise. Google hasn’t disclosed whether its Gemini models are directly embedded in its ad-ranking models, but given the long history of ads being the main leading consumers of new ML (machine learning) infrastructure, it would be surprising if broader DeepMind innovations weren’t flowing into the ad stack over time.
Similarly, Meta often notes that its AI research is improving its own ad platform and social media franchises, Grok has been used to improve the X feed, and Microsoft has integrated OpenAI LLM technology throughout its Office suite as copilot agents. Monetizing a bleeding-edge technology by bundling it with established platforms is not new; it is just seemingly underappreciated by many AI pundits. If AI can stimulate ROI in these massive, legacy franchise businesses, this fuels the capex for higher-risk and less proven AI initiatives.
“Our ads business continues to perform very well, largely due to improvements in our AI ranking systems…And now the annual run rate going through our completely end-to-end AI-powered ad tools has passed “60 billion.”
—Mark Zuckerberg, Meta 3Q25 Earnings Call
Even after we pass the point of greatest uncertainty for a new technology, platform owners often choose a set of high-usage, high-value applications for themselves – leaving the rest (and the long tail) to third-party ecosystem participants. For instance, even when the iPhone app ecosystem was booming, Apple appeared to deliberately choose to capture specific app categories with first-party apps like maps, the browser, music, messaging, the calendar, etc. Similarly, Microsoft nurtured a broad and thriving third-party ecosystem with Windows, but the applications in its enormously successful Office franchise were reserved for the platform owner.
Figure 19: Bundling and Distribution Advantage of Incumbents
Does this mean private LLM labs without legacy tech businesses or AI-native startups attacking these bundled application spaces are at an extreme disadvantage? It depends on the execution capabilities of the incumbents and, of course, the willingness of the public companies to withstand scrutiny if margins and free cash flow are affected by an AI transition. Bundling didn’t stop the emergence of Google in the dotcom era, nor did it stop Spotify (vs Apple Music), WhatsApp (vs Android and iOS messaging), or Slack for business communications. Furthermore, it’s worth stating the obvious: the third-party application space of leading technology platforms is immense and a source of great value for startups that successfully compete and disrupt incumbents. Hence, the profit motive for intense competition and disruption appears sound.
Context capture breeds switching costs
Agents feed on context, which is refined data injected into an agentic flow at just the right time. For an enterprise, this can be the systems of record, metadata, and semantic understanding of a company’s data and workflows. In the consumer space, this can be a personal knowledge graph of a consumer’s preferences and current environment (e.g., is it raining and is the user in transit?).
This context, in both consumer and enterprise, is balanced between the ephemeral, short-term memory and longer-term memory; it also splits between retrieved context and learned context, with the latter contributing to greater switching costs as agentic solutions from one vendor build “on-the-job” knowledge.
Agents capture ephemeral data through sensors, telemetry, or streaming real-time data flows, while longer-term memory is accumulated and stored over time. Incumbents have a natural advantage with longer-term memory, particularly with established customers that have fed into that data over time. Meanwhile, startups tend to have an advantage with shorter-term memory, as their AI-native roots allow them to purpose-build their agentic solution for telemetry and context capture.
Context may be one of the most important core sources of switching costs as agentic AI is deployed, and consumers and enterprises will demand strong privacy and security guardrails. Indeed, it’s possible that enterprises and consumers will avoid having their context controlled by one company, which could open opportunities for portable context innovations – particularly in the consumer space.
Network effects of agent-to-agent cooperation
Absent the emergence of ASI, where a model can “do anything”, the application space is likely to develop into specialized agents, or teams of agents, for different tasks. While it’s natural to assume this will work just like app store models today, there are some key differences that point to a unique form of network effects for agentic AI.
Agents provide horizontal functionality scaling. The current application space for mobile computing has created strong moats separating the market into iOS and Android app ecosystems. In most cases, however, these apps are siloed in terms of functionality. When a user needs new functionality, they scale their app library vertically by adding a new app for a new task. Agents can scale vertically, but they also can scale horizontally, gaining new functionality by working with other agents. As a result, agents can exhibit network effects: as horizontal functionality rises, total functionality potentially can increase superlinearly.
Differentiation via orchestration. The quality of agent orchestration determines how efficiently an enterprise or consumer solution can manage groups of agents for varying tasks. As such, the orchestration layer is likely to be highly competitive as vendors seek to innovate and secure leadership by the number of first-party and third-party agents they support, and by the number of users contributing to on-the-job learning for the agents within the orchestration system.
Incumbents that can link and leverage existing developer and partner ecosystems to maintain their lead as siloed apps become agentic could fend off disruption from AI-native upstarts. Nevertheless, the innovator’s dilemma can counter this inertia if incumbents are slow to capitalize on this shift as they protect their app-centric moats. Furthermore, startups don’t have the challenge of preserving backward compatibility with legacy platform functionality.
From App Stores to Intent Markets: AI’s Impact on Apps
One attribute of technological revolutions is that they often change the fundamental components of the software industry. We’ve seen UIs change from punch cards to touch, opening up richer feature sets and democratizing access. Meanwhile, software distribution has shifted from floppy disks to app store downloads, opening a long tail of applications from small companies and individuals. We’ve also seen software production evolve with rich tooling and information resources like Stack Overflow.
It is now apparent that AI is bringing about another, far more dramatic change, one that could impact all aspects of software. And as some say, software ate the world, and now AI is eating software. This is a big topic, potentially one that merits its own article. But I’ll discuss some key elements, each of which can be expanded upon in future work.
AI as the new UI for applications
If you’re a new consumer app developer or an incumbent, the shift to agentic AI slowly but surely unwinds the tight coupling between the UI and an underlying application’s functionality. This is a profound change that far exceeds the shift to GUI, touch, and even voice. This means the center of gravity for innovation (and commercial success) has less to do with how a human points and clicks in your application and now shifts to designing apps that agents will “choose” to use. In the consumer space, changing UIs can often disrupt legacy compute platforms, opening the door for upstarts. Alternatively, if privacy and latency win the preferences battle, it may be that incumbents can link and leverage their OS and app ecosystem dominance to build an agentic future.
“…when I hang out with them [software CEOs], they’re like, …how are we going to keep doing what we do when the agents take over?”
—Travis Kalanick, July 2025, All-In Podcast
Looking at the example in Figure 20, we imagine how a future consumer AI agent could change the nature of applications and how users interact with them. One utterance (or typed command) replaces multiple manual app workflows, and like a world-class EA, the agent draws on the user’s personal context. The user doesn’t necessarily care which apps or services are used, as long as the trip is booked and the user’s personal preferences are considered.
Figure 20: AI is the UI for Consumer Agentic Applications
The history of the computing industry is often about reducing friction. From GUI to touchscreens, the UI evolves with compute capabilities. AI is potentially adding rocket fuel to this trend. Furthermore, as discussed previously, agentic AI functionality scales vertically and horizontally (agents can combine the functionality of multiple tools and agents to produce new functionality) while traditional, siloed apps mostly scale vertically. Horizontal scaling is an underexplored capability advancement that may very well increase the surface area of innovation for new products and companies. All of this has several critical implications for the consumer technology space that I will explore in the next sections.
A new long-tail of generative and ephemeral apps
When Apple’s iOS App Store first began to ramp, the long-tail apps tended to be made by hobbyists, not companies, and this was a big part of the App Store’s early popularity. Indeed, one of the top ten paid apps in 2008 was iBeer, an app that used the accelerometer to simulate an emptying beer mug as you tilted your phone, as if you were drinking it47. With AI, this phenomenon may be very different. It’s not hard to envision the long-tail as personal, ephemeral apps that AI spins up for a user’s specific need at a particular moment – no human programmers involved.
For example, instead of downloading a “things to do in Connecticut” app, a user might ask an AI agent to spin up a weekend adventure in Connecticut for her family. The AI generates a micro-app with a UI that only exists for that single purpose. In this case, the “developer” is the user, and the “TAM” is a market of one. The long tail of applications moves from static products to a fluid, on-demand, agentic service.
If this evolves as described, the industry implications could be significant. The old long-tail model, which was enabled by the mobile software revolution that began 17 years ago, moves from niche audiences to individual users. The long-tail developer role dissolves into a highly functional agentic assistant. This potentially diminishes the app store moats and requires a careful incumbent shift to agentic marketplaces and orchestration to fend off disruption.
Shifting moats and the “Enterprise OS”
The advent of agents in the enterprise forces a level of refactoring to add the orchestration and guardrails needed to capitalize on agentic AI’s strengths and dampen its probabilistic weaknesses. There are many ways this can come about, ranging from open to walled garden approaches.
For instance, consider “MAESTRO”-like architectures that aim to eliminate walled gardens and vendor switching costs by retrofitting Kubernetes orchestration for agents and handling comms on an emerging open standard like Google’s A2A48.
The alternative framework recognizes the importance of data/context and is built around agentic orchestration from the same vendors that help an enterprise with its mission-critical data and workflows. This walled garden approach is interesting because some of the SaaS vendors most exposed to the AI-eats-Apps risk are the same vendors that could end up with an increasingly dominant industry position via a context-centered, walled garden approach.
This creates a critical inflection point for incumbent SaaS vendors, especially those with existing workflow, automation, or security assets. Many of them have deep expertise, customer-specific data, and knowledge around structured workflows. If they can leverage this position to inject agentic orchestration into today’s human-dependent, largely deterministic workflows, they can link and leverage their incumbent success to secure a potentially larger and more dominant position in agentic AI. In the end, they can gain more vertical coverage and stickier customers.
On the other hand, if incumbents fail to move quickly, they can be sidelined by more aggressive competitors or leapfrogged by tool- and platform-agnostic startups. It’s really a question of disruption by abstraction (startups win) or dominance by inertia (incumbents win). The age-old open-versus-closed debate once again bubbles to the surface.
From Power Laws to Phase Transitions: Rethinking AI’s Progress
When Gemini 3 was launched in November of 2025, news reports and social media users were enthusiastically posting a benchmark card that highlighted the model’s impressive capabilities (Figure 21). The benchmark card has 20 metrics in total, and Gemini has posted very impressive performance on many fronts. This leapfrogging of competitors has been accomplished by other frontier labs in the past, and the frequency of leapfrogging seems to be accelerating. Indeed, Anthropic’s Claude Opus 4.5 and OpenAI’s GPT 5.2 generated another wave of surprising capabilities at the end of 2025 as well. But what do these benchmarks mean? How does it sync with sobering reports like MIT’s July 2025 report, The GenAI Divide: State of AI in Business in 2025? Do any of the benchmarks tell us we are approaching a plateau? Some of them are approaching 100%, but then new, more challenging benchmarks appear.
Figure 21: Gemini 3 Pro Model Card49
The metric overload that accompanies every new model release does little to explain to a non-technical finance person why the capex and R&D needs of AI are so large and potentially growing larger. On the same day, you can find one article discussing diminishing returns or a plateau, and another where a researcher is discussing artificial superintelligence. One can imagine how this complexity enters a conversation between a lab CFO and an AI researcher:
We need to understand the real-world impact of AI improvements, including a timeline for human-level task completion. We also need to understand if LLM scaling laws will hit a wall and what drives that. And finally, it would be nice to know how realistic the “graduation to exponential” concept is and whether we should be pondering ethereal concepts like ASI (artificial superintelligence). While providing forecasts for these things is best done by AI researchers, I believe these are also topics that will be critical for finance and business leaders to monitor as well.
Scaling laws, power laws, and diminishing returns
It’s now common to hear of ever-larger GPU clusters and data for new model releases, but this predictable scaling wasn’t always obvious. In early 2020, researchers from OpenAI released a research paper introducing the world to language model scaling laws50.
“Model performance depends most strongly on scale, which consists of three factors: the number of model parameters N (excluding embeddings), the size of the dataset D, and the amount of compute C used for training.”
—Kaplan, et al., January 2020
The Kaplan report, and follow-ups, provided academic fuel for the capex and R&D race that began in earnest after the release of ChatGPT in November of 2022. The report put mathematical rigor behind the surprisingly prescient quote from Richard Sutton’s Bitter Lesson essay in 2019: “…researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation”51.
I won’t go into the main equation (Eq. 1.5 in the Kaplan paper), but the basic idea is that the loss (cross-entropy loss, which represents the difference between predicted and correct answers) declines with more compute, data and parameters. More specifically, the loss falls as a power law, and this is very important.
It’s important because a power law isn’t the same as the “exponential” concept we often hear in finance circles. By definition, the scaling improvements could eventually face uneconomical costs and diminishing returns as we approach the minimum loss, presumably, pre-training gains slow. Fortunately, we don’t appear close to this limit yet, and Gemini 3 is the latest encouraging example in this respect.
Figure 22: Select AI Index technical performance benchmarks vs. human performance, early 202552
The limit may not be set in stone: graduating to exponential
In Figure 23, we can see that the latest models are already approaching the theoretical maximum for the ARC-AGI-1 benchmark, with little evidence of stagnation, as shown by the up-and-to-the-right curve. In fact, leading models such as GPT-5.2 Pro and Grok 4 have pushed scores into the 80-90 percent range, effectively saturating a benchmark once considered a grand challenge for AI.
Figure 23: ARC-AGI-1 leaderboard is essentially saturated53
And the more challenging ARC-AGI-2 benchmark was established as a result (and ARC-AGI-3 will make an appearance for leaderboard discussions in 2026). As we can see in Figure 24, frontier models are already pushing into the 50-plus percentage range, and we wouldn’t be surprised by saturation in 2026.
Figure 24: ARC-AGI-2 leaderboard may saturate next
Despite the continued progress of LLM-based models, any emergence of a plateau would be a problematic development for the industry. The concern around a limit is multifold. First, if we hit a limit before adequate reasoning, agentic architectures, and extended task length can fuel a large-scale deployment of economically viable agents in the enterprise or consumer markets, this could delay the crossing of the implementation gap; this is a compelling fear because it has happened with nearly all general purpose technologies in the past, even if they proved enormously useful in time (e.g. the steam engine, railroads, the dotcom boom, etc). Second, the power law improving model capabilities is still dependent on continued increases in compute and data; this may not be tenable for economic reasons and due to the natural scarcity of resources such as power to drive that compute. Third, if this limit becomes apparent in the new model releases, the capital markets’ appetite for funding the scaling could suffer.
Nevertheless, all of this assumes the mechanism of progress remains the same. Is there a mechanism that enables more persistent, exponential growth? What if the AI industry enters a phase transition from power-law rules to continuous exponential growth? Such a mechanism would likely have the following characteristics that shift the growth curve from external inputs (compute and data) to internal improvements driven by the system’s own capabilities:
Mathematical capabilities are as good as the best human. At its core, AI research requires advanced mathematical skills to guide in the exploration of paths that will lead to far greater model capabilities. This is a difficult achievement, but substantial progress is being made. Indeed, a recent collaboration between the mathematician Terence Tao and DeepMind researchers demonstrated that by leveraging DeepMind’s AlphaEvolve tool, “large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best-known results”54.
Coding skills are as good as the best human. Once AI can code as well as the best human, gains in algorithmic efficiency could accelerate much more rapidly. Indeed, capital efficiency could improve as well, as AI creates novel kernel and software frameworks for infrastructure. The SWE-bench metric measures how many coding problems AI could solve; the benchmark was at 4.4% in 2023 and reached 77.2% in 2025 with Claude Sonnet 4.555. Anthropic’s recent viral Claude Cowork release (as a research preview) was reportedly written almost entirely by AI in two weeks56.
AI Researcher agents are improving models without humans. If AI agents could leverage world-class mathematical and coding skills, they could presumably begin to produce better models with coding, experimentation and deep analysis. If this becomes possible, labs could potentially spin up a large number of the researchers to generate and test hypotheses, run experiments, and design new architectures for capability improvements. The holy grail is RSI (recursive self-improvement), but there are likely many step function improvements possible with AI researcher agents before that ideal achievement.
There are many challenges to creating autonomous AI researchers, and there are no guarantees that AI labs will solve them. The most essential research hurdle, however, is long-horizon execution. Simply stated this is the ability of an LLM or agent to stay on task for a long period of time, without losing track of its progress, crashing or hallucinating into failure. Of course, as I have discussed earlier in this article, real-world agent capabilities will also be highly dependent on agentic architecture improvements that add governance and determinism into key workflows.
Figure 25: Task length made hefty leaps in 202557

A late 2025 research paper by researchers at the University of Cambridge, the Max Planck Institute for Intelligent Systems, and several other universities suggests that the horizon length challenge may improve exponentially with marginal gains in the accuracy of single steps58. The novel conclusion of the paper is basically that long-horizon tasks are plagued by what the authors call the Self-Conditioning Effect; this suggests that models are not merely limited by their context window size, but their own past mistakes also corrupt them. The result is that per-step accuracy degrades over time as model errors compound. The upshot is that the researchers showed that thinking models (chain-of-thought or sequential test-time compute models, first productized in late 2024) can reduce this problem and extend the task horizon. In other words, reasoning potentially allows the model to debug itself.
“…The improvement in horizon length for a fixed gain in step accuracy grows quadratically, highlighting the compounding benefits of scale.”
—The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs, September 28, 2025
While this is undoubtedly an exciting development, many challenges can impact horizon length for agents operating in the real world with a diverse range of task options. Nevertheless, as AI research accelerates toward real-world, agentic applications, the prospect of exponential improvements in horizon length is an important consideration for anyone analyzing the AI sector.
LLMs Are Dominant, But Not Alone
The current AI wave began with the unprecedented success of ChatGPT, but it was the culmination of decades of research. Most of this research was not widely understood or discussed in finance and business circles before ChatGPT. That is changing, however, and it’s incredible to see once-obscure academic fields become part of everyday business dialogue. With that said, the academic debates that preceded the launch of ChatGPT have not faded away, and many of them centered on whether LLMs or transformer-based architectures are the proper core technology for AI advancement. These debates are fascinating and essential for any business or finance executive exploring AI.
LLM alternatives remain in the debate
ChatGPT was an application that ran on top of a transformer-based LLM, so headlines and popular excitement have justifiably been centered on the latest LLM innovations from the frontier labs. As we have discussed previously, this also led to industry analysis frameworks that often suffered from “chatbot myopia”, failing to recognize that the ultimate application space for AI is agentic, and chatbots were merely the first starting point. This is often further confused by the natural tendency to conflate “AI” with “LLMs” when the latter is merely one tool (albeit a currently dominant tool) for achieving the agentic goals of AI and the broader research goal of AGI (and even ASI).
Many frontier labs, including Google DeepMind, are approaching AI research with multiple frameworks, in addition to LLM scaling. There is a long list of such architectures, and I summarize just a few below.
World Models. World models are AI architectures that build a simulation of an environment’s physics, dynamics and rules. This allows them to go beyond the pattern recognition of LLMs and simulate what happens next if an agent or robot takes an action. This is a top area of research for major labs, particularly Google DeepMind. Indeed, DeepMind’s David Silver and Richard Sutton wrote “The Era of Experience” to highlight how AI should learn from experience, and DeepMind’s Genie 3 is a foundation world model that seeks to offer a simulation for agents to gain this direct experience59.
JEPA (joint embedding predictive architectures). JEPA is a unique recipe for building world models, so arguably, the concepts are similar. But this is a very unique recipe, so we keep it separate. Yann LeCun, former Chief AI Scientist at Meta, is arguably the leading voice in JEPA development. The main idea is that, instead of simulating every pixel as you would in a world model, you can train a model to predict ideas/concepts in a world simulation rather than producing exact details.
Neuro-symbolic systems. Just as the name implies, neuro-symbolic systems combine the probabilistic capabilities of neural networks (i.e., an LLM) with the explicit rules, logic, and knowledge graphs of symbolic systems. Symbolic systems enjoyed a mini-economic boom in the 1980s (previously cited Liguori, 2025), but they faded as neural networks finally had the data and compute power to produce breakthroughs. The neuro-symbolic hybrid approach is gaining quite a bit of steam in labs and amongst researchers.
Much like AI research over the past 70 years, there is a persistent oscillation between probabilistic models and more deterministic, rule-based frameworks. A microcosm of this oscillation seems to be well encapsulated with the current architectures for agentic AI implementation.
Watch out for the conflation fallacy
Any casual exploration of social media will reveal researchers (or research enthusiasts) discussing a recent paper on an architecture that holds more promise for reaching AGI than LLMs. At the same time, you’ll see finance- or market-oriented professionals debating the merits of the current capex cycle, and this too is justified since LLM-based architectures are indeed far more capex-intensive than any technology wave in recent memory. The problem is when these two groups meld their arguments.
Figure 26: The Conflation Fallacy for LLM Alternatives
Bearish and skeptical views of any economically impactful innovation are appropriate, and vigorous debate is part of efficient capital allocation. And on the research side, concept wars over theories and architectures drive innovation. But still, there is a problem with the two hypothetical quotes in Figure 26: the sin of conflation fallacy.
Many of the alternative LLM architectures have the following characteristics: 1) in many cases, companies would use these alternatives alongside LLMs (i.e., adding symbolic rules or determinism to the creativity of LLMs), 2) the capex requirements for many LLM alternatives are potentially on par with or higher than pure-LLM capex needs, and 3) LLM capex can be highly fungible with many LLM alternatives. So, in most cases, researcher-based LLM skepticism is a poor input for current debates over AI economics or capex, and vice versa.
Let’s look at some recently released world models as an example (e.g., Google DeepMind’s Genie 3). While training a world model on specific environments could be capital efficient relative to LLMs, the inference costs could eclipse those of LLMs. Unlike an LLM that predicts the next word, a world model often runs multiple parallel simulations of the future to determine the appropriate action. This “thinking time” can require massive compute at the moment of action. In the end, it’s conceivable the industry will need a lot more GPUs and capex if world models turn out to be the future path.
With all that said, there are certain architectural approaches that favor models with far less capex-intensity. The DeepSeek MoE shock in early 2025 was one such example, though the actual capex and the level of distillation from capex-intensive models were somewhat murky. And extreme versions of the neuro-symbolic approach (extreme on the symbolic side) present approaches where capital intensity can be moderated60. Finally, Yann LeCun, former Chief Scientist at Meta, has aggressively argued for his JEPA-style world models as a cheaper alternative (and replacement for LLMs).
In the end, AI development and deployment are a capital-intensive enterprise, and any algorithmic or infrastructure efficiency breakthroughs would be welcome news for the AI application space. Ultimately, it is reasonable to conclude that LLMs will remain the core of AI company strategies for the foreseeable future.
A Conceptual Test-Drive of Agentic: Autonomous Vehicles
While there are many real-world examples of the concepts I have discussed in this article, the bleeding edge of agentic, world models, and edge AI may be occurring in the Tesla waiting next to you at the traffic light or in the Waymo taking you to Fisherman’s Wharf. Semi-autonomous vehicles across the globe are transitioning from rule-based code to neural network based autonomy.
Indeed, in many ways, Tesla, Waymo and near-autonomous vehicle vendors in China are commercializing agentic AI concepts faster than enterprise and consumer software companies. They are facing the same challenges, such as how to inject predictable determinism into inherently probabilistic systems. They see the same opportunities for truly autonomous task completion via training on world models. And they are facing their own flavor of the agentic implementation gap.
Perhaps most importantly, the leaders in autonomous vehicle development are part of or closely related to leading LLM labs (e.g., xAI and Google DeepMind). As such, it’s reasonable to conclude that learnings in the digital agent world will spill over into the embodied AI world of autonomous vehicles, and vice versa.
As a result, in this closing section, I will reinforce and review the concepts discussed thus far, while introducing some concepts I have spent less time on, such as edge versus cloud AI. As should be apparent to anyone who has tried Tesla FSD or ridden in a Waymo, we’re seeing value and innovation with autonomous vehicles well before full autonomy has been achieved – and a similar phenomenon will potentially be apparent with consumer and enterprise agentic AI in 2026.
From expert systems to neural networks
The autonomous vehicle (AV) industry experienced a recent version of the transition from symbolic, rule-based AI to neural networks in early 2024, with Tesla’s release of Full Self-Driving beta v12.1.2. Elon Musk and team previously announced that they had thrown out 300,000 lines of C++ code for an end-to-end neural network trained on millions of driving videos. The bet was that going all in on AI would allow its fleets to learn while driving and accelerate the path to fully autonomous self-driving61.
“And we’ve got the final piece of the puzzle, which is to have the control part of the car transition from about 300,000 lines of C++ code to a neural network, so the whole system will be a neural network, photons into controls out.”
—Elon Musk, All-In Podcast’s 2023 Summit
Similarly, a December 2025 blog by Waymo explained how its Waymo Foundation Model is a “versatile, state-of-the-art world model powering our AI ecosystem.” Interestingly, the company also noted that a core component of this model is the “Driving VLM”, which is a vision language model trained on Gemini, Google DeepMind’s LLM62. And both Tesla and Waymo have highlighted how they can use real-world driving data from their fleets to train these models and further iron out unsafe edge cases.
While noting there is “no line of code instructing the vehicle to slow down for speed bumps,” Musk’s vision for Tesla neatly captures the vision for agentic AI more broadly. The probabilistic nature of AI models is a source of creativity and the ability to understand ambiguity, which is key to autonomy and something pure rules-based systems may never capture. But just as you wouldn’t want your mission-critical enterprise application to run on probabilities, nor would you like your electric vehicle to hallucinate while crossing an intersection. As I discussed with agentic AI architectures, autonomous driving appears to require careful orchestration and sufficient context to ultimately achieve full autonomy.
Deterministic orchestration for safe autonomous driving
Just as the orchestration layer for enterprise and consumer agentic applications provides critical verifier and deterministic guardrails, autonomous vehicle systems leverage similar components to ensure safety. In Waymo’s case, the orchestration layer includes a “Critic’ model that evaluates the digital “driver’s” performance, a separate onboard validation layer that verifies trajectories produced by the machine learning model, and rigorous “Safety Envelopes” to provide mathematical guarantees of safety.
For Tesla, the company has a more implicit orchestration layer as it leans into model reinforcement learning as the longer-term trajectory. Nevertheless, it includes deterministic code that acts as a safety wrapper for the neural network, and the system calls on a navigation planner tool to feed explicit destination information to the model.
Context and world models
Waymo leverages world models for training and as a schematic for attaching meaning to the real-world data it collects. In practice, this means Waymo vehicles are bundling Lidar, Radar, and visual data of the environment with explicit, high-fidelity maps to determine split-second driving decisions throughout operation. This high-value data not only ensures a safe trip for humans, but can also be used to identify edge cases to refine and train the foundation model.
Figure 27: Tesla vs Waymo Context Sensors
Tesla has typically shunned Lidar and Radar sensors and is instead relying solely on the visuals from camera feeds. The logic here is two-fold: 1) if humans can drive with just two eyes, then an AI-infused vehicle should eventually be able to do the same; and 2) a camera-based system is cheaper to produce and visually sleeker (i.e., there is no large sensor module on top of a Tesla). Tesla also relies on an Occupancy Network to create a real-time, detailed understanding of the world, and it relies less on pre-stored maps and data for grounding context.
A hybrid edge and cloud implementation
Latency is a key advantage of edge AI versus cloud implementations, and it’s hard to imagine an application where latency is more dangerous than in autonomous driving. With latency thresholds often less than 50 milliseconds, the edge requirements for autonomous vehicles are extreme. That said, training the foundation models with real-world video and sensor data is far more appropriate for a large, cloud-based model. As we’d expect with many digital AI applications, the autonomous vehicle industry uses both edge and cloud AI in its vehicles.
Thinking fast and slow
Waymo’s model provides a solid illustration of the hybrid approach. Leveraging Daniel Kahneman’s “Thinking, Fast and Slow” construct, Waymo uses edge intelligence for rapid and near-reflexive responses to real-time sensor input. Meanwhile, the cloud is reserved for training and “thinking slow” and handling rare and complicated scenarios like a disabled car that is on fire (i.e., the cloud model can leverage its deep knowledge to explain what is happening and offer solutions). The cloud can also be leveraged for fleetwide management, and perhaps eventually, for swarm intelligence across all vehicles. And for both Tesla and Waymo, the cloud can leverage vast amounts of fleet data for training large models.
Distillation and optimization for local models
For both Tesla and Waymo, the size constraints of edge AI models are overcome with distillation and optimization from larger models to allow for stronger capabilities. In Waymo’s case, this is done with “Teacher” foundation models (e.g., Gemini-based VLM) distilling capabilities into “Student” models. The Student models then allow for in-vehicle, local perception and planning capabilities.
Similar processes are common for edge models used in consumer electronics, IoT, drones, and robotics. The overall goal is to compress complex, cloud-based models into versions that can meet the privacy and latency needs of edge applications.
On-the-road learning
Much like DeepMind’s research focus on agents that can learn from real-world activities (i.e., “The Era of Experience”), the goal of end-to-end AI for autonomous vehicles is a learning flywheel. For Tesla, the company runs experimental models in the background. If the model predicts an action and the human chooses a different action, this is recorded as high-value data for training. Furthermore, every time the human disengages FSD (without the car mandating it), this is a powerful negative reward signal for model training.
Meanwhile, Waymo leverages a Critic model that reviews autonomous driving logs to identify unsafe or inefficient behaviors. The company also leverages large models to take a single real-world incident and create thousands of variations of that incident for training. Finally, the company leverages closed-loop reinforcement learning to simulate millions of virtual miles with extreme edge cases to train the “Waymo driver”.
For both Tesla and Waymo, it’s apparent that they have a massive and constantly replenishing real-world data set for their models. This produces a virtuous cycle where better models generate better performance, which attracts more drivers/passengers, which in turn brings in more data for training (Figure 28). This is precisely where enterprise and consumer agentic AI researchers are aiming to take the broader world of compute. Autonomous driving is showing us the path and elucidating the power of agents and AI more broadly.
Figure 28: The Autonomous Flywheel
Conclusion
The “learning flywheel” of autonomous vehicles offers a fascinating preview of the agentic future potentially impacting the broader economy. Just as these physical agents are learning to navigate the complexities of crowded roads with cutting-edge orchestration and rich data context, AI leaders are attempting to traverse the real-world challenges of enterprise and consumer workflows. This shift from software to digital labor and the challenges along the way is not simply an exercise in rebranding; it’s a fundamental re-architecture of economic productivity. A re-architecture that is being driven by a convergence of falling token costs, the stabilizing force of agentic architectures, and the accelerating influence of geopolitical competition.
While the “Implementation Gap” will likely be the source of intense debate between optimists and skeptics, the trajectory of AI capability improvements seems unlikely to abate anytime soon. As organizations attempt to bridge this gap, moving from passive chatbots to active, verifiable agents, the potential economic impact of AI could rival industrial and digital revolutions of the past. For investors and business leaders, the strategic imperative is no longer to merely marvel at the raw capabilities of the latest chatbot, but rather to identify and build the orchestration layers and proprietary context engines that will define the enduring moats in this new era of digital labor.
Disclaimer: The views and opinions expressed in this article are solely my own and do not represent the views of my employer or its affiliates. This article is provided for general discussion purposes only and does not constitute investment advice, a recommendation, or an offer or solicitation to buy or sell any securities or financial instruments.
Metz, R. (2024, July 11). OpenAI Scale Ranks Progress Toward “Human-Level” Problem Solving. Bloomberg
The academic definition of agents is often defined as perceiving an environment via sensors and acting upon the environment through actuators; notably this is the standard definition in the textbook, Artificial Intelligence: a modern approach. Russel, S.J. and Norvig, P.
This three-level classification is a simplified version of the typical five-level framework and excludes simple reflex and utility-based agents given our focus on economic implications.
Moore, S. (2019, April 18). Earliest steam engines used to pump water. Farm and Dairy.
Hazlett, T. W. (2017, July). We Could Have Had Cell Phones Four Decades Earlier. Reason Magazine.
Krugman, P. (1998). Economics, Why most economists’ predictions are wrong. Red Herring Magazine.
Andreessen, M. (2011, August 20). Why Software Is Eating the World. Andreessen Horowitz.
Rampell, A. (2025, October 3). Software is Eating Labor. YouTube.
Bresnahan, T.F., & Trajtenberg, M. (1995). General Purpose Technologies ‘Engines of Growth’? Journal of Econometrics, 83-108.
Chin, J., & Huang, R. (2025, November 10). The AI Cold War That Will Redefine Everything. The Wall Street Journal.
Lambert, N. (2025, September 9). On China’s open source AI trajectory. Interconnects.
Yoon, J. (2025, November 11). What if the AI race isn’t about chips at all? Financial Times.
Clemente, J. (2025, June 16). China Vs. U.S.: AI Supremacy Requires Reliable Electricity. Forbes.
Huang, R., & Spegele, B. (2025, December 10). China’s AI Power Play: Cheap Electricity From World’s Biggest Grid. The Wall Street Journal.
Vaughn, S. (2025, September 10). From Vouchers to Visas: China’s Innovative Plan for AI Dominance. Foreign Policy Research Institute.
Chan, K., Smith, G., et al. (2025, June 26). Full Stack: China’s Evolving Industrial Policy for AI. RAND.
He, A. (2024, August 12). In Developing AI, China Takes the Industrial Route. Centre for International Governance Innovation.
Chang, W. (2025, October 2). China’s “AI+” drive aims for integration across sectors” a wake-up call for Europe. MERICS.
The Economist. (2025, August 21). China is quietly upstaging America with its open models.
Allen, G. C. (2024, December 11). Understanding the Biden Administration’s Updated Export Controls. The Center for Strategic and International Studies (CSIS).
Reuters. (2025, December 22). Exclusive: Nvidia aims to begin H200 chip shipments to China by mid-February, sources say. Reuters.
The White House. (2025, July 23). White House Unveils America’s AI Action Plan. President Donald J. Trump, The White House.
Trainor, T. (2019, January 3). How Ford’s Willow Run Assembly Plant Helped Win World War II. Assembly Mag.
Leibson, S. (2024, June 10). The First ICs on the Moon—The Apollo Guidance Computer, Part I. Electronic Engineering Journal.
Tunguz, T. (2025, November 6). Are We Being Railroaded by AI? tomtunguz.com, Theory Ventures.
Haeck, P. (2025, November 19). The EU promised to lead on regulating artificial intelligence. Now it’s hitting pause. Politico.
Liguori, G. (2025, August 2). Charting the Evolution of AI: From Symbolic Systems to Autonomous Agents and Beyond. LinkedIn.
Mucci, T. The History of Artificial Intelligence. IBM.
Vaswani, A., Shazeer, N., et al. (2017, June 12). Attention is All You Need. ArXiv.
Jones, N., & Nature. (2025, November 29). Could Symbolic AI Unlock Human-Like Intelligence. Scientific American.
Bảo Châu, N. (2024, January 17). AlphaGeometry: An Olympiad-level AI system for geometry. Google DeepMind.
Aboud Ali, M., & Dornaika, F. (2025, October 29). Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions. arXiv.
Chang, E.Y. (2025, December 5). The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics. arXiv.
Tao, T. (2025, November 5). Mathematical exploration and discovery at scale. Terry Tao, What’s New (Wordpress).
Dell’Acqua, F. M. (2023, September 22). Navigating the Jagged Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School.
Pueyo, T. (2025, November 28). My take on the jagged frontier. Substack: Uncharted Territories.
Karpathy, A. (2025, November 17). Verifiability. Karpathy: karpathy.bearblog.dev/verifiability/
Yee, L., Madgavkar, A., et al. (2025, November 25). Agents, robots and us: Skill partnerships in the age of AI. McKinsey Global Institute.
Brynjolfsson, E., Rock, D., & Syberson, C. (2021). The Productivity J-Curve: How Intangibles Complement General Purpose Technologies. American Economic Journal: Macroeconomics, 333-372.
Our World in Data, (2021). Labor share of gross domestic product (GDP)., 2004-2020. Via International Labour Organization; Our World in Data.
Cottier, Snodin, et al. (2025, March 12). LLM inference prices have fallen rapidly, but unequally across tasks. Epoch AI.
ARC Prize. (2025, December 11). X post, x.com/arcprize.
Huss, R. (2025, May 15). Huang’s Law is Eating Moore’s Law (And Reshaping AI’s Growth Curve). Hackernoon.
Lightspeed. (2024, November 18). AI Will Eat Services. LSVP.com.
Iannaccone, S. (2025, December 24). Alphafold Changed Science. After 5 years, It’s Still Evolving. Wired.
Kumparak, G. (2008, December 2). Apple announces top 10 iPhone App downloads of 2008. TechCrunch.
Jyoti, R. (2025, December 8). Meet the MAESTRO: AI agents are ending multi-cloud vendor lock-in. CIO (cio.com).
Google DeepMind. (2025, November 18). Gemini 3 Pro, Model Card. Googleapis.com.
Kaplan, M. H., McCandlish, S., et al. (January 2020). Scaling Laws for Neural Language Models. arXiv.
Sutton, R. (2019, March 13). The Bitter Lesson. Incomplete Ideas.
Maslej, N., Fattorini, L., et al. (2025, April). Artificial Intelligence Index Report 2025. HAI Stanford University.
ARC Prize. (2025, December). ARC-AGI Leaderboard. Arcprize.org.
Georgiev, B., Gomez-Serrano, J., Tao, T., et al. (2025, November 3). Mathematical exploration and discovery at scale. arXiv.
Anthropic. (2025, September 29). Introducing Claude Sonnet 4.5. Anthropic.com
Bastian, M.(2026, January 13). Anthropic’s Claude Cowork was built in under two weeks using Claude Code to write the code. The Decoder.
METR. (2025, December 19). METR_Evals. X post, x.com/METR_evals/.
Sinha, A., Arun, A., et al. (2025, September 28). The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs. arXiv.
Silver, D., & Sutton, R. S. (2025, April 26). Welcome to the Era of Experience. GoogleAPIs: storage.googleapis.com.
Velasquez, A., Bhatt, N., et al. (2025). Neurosymbolic AI as an antithesis to scaling laws. PNAS Nexus, 117.
Ramey, J. (2024, January 26). Tesla Bets On AI In Latest FSD Update. Autoweek.
Waymo AI Team, (2025, December 9). Demonstrably Safe AI For Autonomous Driving. Waymo Blog.












































