The Economic Suicide Schedule: Why AI Chips Die Twice

The utility lifecycle and the economic lifecycle of AI hardware have diverged so dramatically that the gap between them is now a $176 billion accounting question, and possibly the defining structural weakness of the entire AI boom.

Two Deaths, One Chip

Every AI chip dies twice.

The first death is physical: thermal exhaustion, silent bit-flips, memory errors that accumulate until the hardware becomes unreliable. This death is slow, measurable, and in the grand scheme of things, manageable. A well-maintained data center GPU can physically function for five to seven years.

The second death is economic: the moment the next generation ships and the cost-per-token of the chip you just bought drops below the cost of the electricity needed to run it competitively. This death is sudden, violent, and increasingly unavoidable. It arrives in eighteen to thirty-six months.

These are not the same event. They are not even close to the same event. And the growing chasm between them, between how long a chip can work and how long it should work, economically, is the fault line running beneath the entire AI infrastructure buildout.

To understand why this matters, we need to define our terms, trace how we got here, and then look very carefully at the numbers the industry would prefer you not examine too closely.

Defining the Terms: Three Lives of a Chip

The current debate about AI hardware depreciation has been muddied by the conflation of three distinct concepts. Let’s separate them.

Technological Life (Frontier Utility) is the period during which a chip can perform the most demanding computational tasks available; specifically, training frontier AI models. For current-generation GPUs, this window is approximately eighteen to twenty-four months, tightly coupled to NVIDIA’s product cadence. Once the next architecture ships, the previous generation can no longer compete on training cost-per-FLOP. Technological life is short, binary, and accelerating.

Economic Life (Total Utility) is the full period during which a chip can generate revenue in excess of its operating costs, including not just frontier training but also inference, fine-tuning, batch processing, analytics, and any other workload where it remains cost-competitive against alternatives. Economic life is longer than technological life because chips “cascade” down through progressively less demanding tiers of work. Current estimates place the economic life of a high-end data center GPU at three to five years, depending on power costs, cooling infrastructure, and the speed at which the market for secondary compute saturates.

Accounting Life (Depreciation Schedule) is the period over which the asset’s cost is spread on the balance sheet. This is an administrative decision, governed by Generally Accepted Accounting Principles but subject to significant managerial discretion. And this is where the story gets interesting, and, depending on who you ask, alarming.

In the first half of the 2020s, the world’s largest cloud providers (Amazon, Alphabet, Microsoft) all extended the official “useful life” of their servers from a three-or-four-year standard to six years. Meta moved from three years in 2020 to five and a half years by 2025. CoreWeave adopted a six-year schedule from launch despite running an exclusively AI-focused infrastructure. The industry that once assumed three-to-four-year refresh cycles now depreciates hardware as though it will remain economically productive for the better part of a decade.

The gap between economic life and accounting life is not a rounding error. It is measured in hundreds of billions of dollars.

A Brief History of How Chips Learned to Die Faster

To understand the current crisis, you need to understand that the depreciation problem is not new. What is new is the speed.

The CPU Era: Predictable Decay

For decades, data center hardware followed a relatively stable lifecycle. A server running standard CPUs had a physical life of five to seven years and an economic life that roughly matched. Intel’s “tick-tock” cadence delivered new process nodes and architectures on a roughly two-year cycle, but the performance gains between generations were incremental, ten to twenty percent improvements, occasionally thirty percent. A three-year-old server was slower than a new one, but not catastrophically so. The cost of replacing it had to be weighed against the cost of continued operation, and the math often favoured keeping it running.

The standard depreciation schedule for servers was three years. This was conservative but defensible. When hyperscalers extended schedules to four, then five, then six years in the early 2020s, they could point to genuine improvements in hardware durability, software optimization, and workload management. Amazon’s 2020 extension from three to four years was grounded in observable improvements in server longevity. The move to six years was more aggressive, but still within the realm of plausibility for general-purpose compute.

The key characteristic of the CPU era: the utility lifecycle and the economic lifecycle moved in approximate lockstep. A physically functional server was usually an economically functional server.

The GPU Pivot: The Lockstep Breaks

The introduction of GPUs into data center workloads changed the equation fundamentally, but the full implications took time to become visible.

NVIDIA’s data center GPU architecture evolution proceeded through a series of increasingly rapid generational leaps:

Kepler (2012): Introduced SMX units, improved power efficiency. The foundation of GPU general-purpose compute.
Maxwell (2014): Unified memory architecture, 2x energy efficiency over Kepler. Still primarily a graphics architecture.
Pascal (2016): The first architecture purpose-built for both HPC and emerging AI. Introduced HBM2 memory and NVLink. The Tesla P100 was the first GPU that AI researchers could train deep learning models on at meaningful scale.
Volta (2017): The true inflection point. Introduced Tensor Cores, specialized matrix multiplication units that delivered 10x faster deep learning training than Pascal. Volta was the moment NVIDIA pivoted from selling GPUs that happened to be useful for AI to selling AI accelerators that happened to be GPUs.
Turing (2018): Added ray tracing for consumer, but more importantly for data centers, introduced INT8 and INT4 precision support — the first steps toward the inference optimization that would later become critical.
Ampere (2020): The A100. Introduced Multi-Instance GPU (MIG), third-generation Tensor Cores, TF32 and BF16 support, and structured sparsity. The A100 became the workhorse of the first wave of large language model training.
Hopper (2022): The H100. Fourth-generation Tensor Cores, the Transformer Engine with FP8 precision, 80GB HBM3. Delivered roughly 6x faster inference and 3x faster training than the A100 for transformer workloads. This was the chip that made GPT-4-scale training economically viable.
Blackwell (2024): The B100/B200. Fifth-generation Tensor Cores, FP4 precision, 192GB HBM3e, a two-die chiplet design with 208 billion transistors. Up to 25x better energy efficiency than Hopper for specific inference workloads.
Blackwell Ultra (2026): The B300/GB300. 288GB HBM3e, 15 PFLOPS FP4 dense. 35% faster training than B200 at comparable configurations.

Two things happened as this architecture cadence accelerated. First, the performance gap between generations widened dramatically. The jump from Pascal to Volta was not a ten-percent improvement, it it was an order of magnitude for AI workloads. Each subsequent generation compounded this. Second, NVIDIA explicitly shifted from a two-year to an annual product cadence for data center GPUs, announced at GTC 2024.

The combined effect: by the time the H100 shipped in 2022, the useful life of the A100 for frontier training was functionally over. Not because the A100 stopped working. It still worked fine. But running it for frontier training cost more per token than running the H100, and the gap was not close. The A100 was two years old and already economically dead for its highest-value purpose.

This is the moment the lockstep between utility and economic life shattered, permanently.

The Current Crisis: The $176 Billion Gap

In late 2025, Michael Burry, the investor who called the 2008 subprime mortgage collapse featured in the film The Big Short and played by Christian Bale, turned his attention to AI hardware depreciation. His argument was straightforward and devastating.

Hyperscalers were depreciating GPU infrastructure over five to six years while NVIDIA’s product cadence rendered the hardware economically non-competitive for its primary purpose in two to three years. If you adjusted the depreciation schedules to reflect the actual economic life of the hardware, Burry estimated the cumulative impact on reported earnings would exceed $176 billion between 2026 and 2028, leaving reported operating income at companies like Oracle and Meta more than twenty percent above economic reality.

Burry’s analysis adjusted hyperscaler capex depreciation schedules from 5–6 years to the observed 2–3-year economic life for frontier workloads, using public SEC filings and secondary-market pricing. He backed this view with put options on NVIDIA and Palantir. NVIDIA responded with a detailed memo to Wall Street analysts disputing the claim. The industry split into camps.

The optimists argue for the “value cascade,” the idea that a GPU’s economic life extends far beyond its frontier training life because the hardware cascades through progressively less demanding tiers:

Years 1-2: Frontier training. The highest-value use. This is where the chip earns back most of its cost.
Years 3-4: High-value inference. The chip can no longer compete for training, but it can still serve inference requests cost-effectively, especially as inference is projected to consume eighty percent of AI compute cycles by 2030.
Years 5-6: Batch inference, analytics, internal workloads. The long tail of revenue generation.

This cascade is real. Legacy GPUs do retain economic value for years. The secondary market data confirms this: used H100s have stabilized at roughly $18,000 to $22,000 in early 2026: a significant decline from their $25,000-$40,000 purchase price, but far from zero. A100s are trading at $12,000 to $18,000. Cloud rental prices for H100s have fallen from $7-$10 per GPU-hour at peak to $2-$4 per hour, but they are still generating revenue.

The pessimists, and unusually for me, I count myself cautiously among them, argue that the cascade is real but insufficient to justify the accounting schedules being applied. Here is why.

Five Problems with the Value Cascade

Problem One: The cascade saturates. As hyperscalers upgrade to each new generation, they push the previous generation into the inference tier. But the inference tier is not infinite. When hundreds of thousands of H100s cascade into inference simultaneously, they compete with each other and with purpose-built inference chips. NVIDIA’s own L4 and L40S, Google’s TPUs, Amazon’s Trainium and Inferentia, and a growing ecosystem of custom silicon occupy this space, with tension. The assumption that cascaded hardware can generate revenue indefinitely requires the assumption that the market for secondary compute grows as fast as the supply. There is no evidence this is the case.

Problem Two: The energy trap. Even when a cascaded chip can technically perform a workload, the energy cost of running it may exceed the energy cost of performing the same work on newer hardware. Blackwell offers up to 25x better energy efficiency than Hopper for specific inference workloads. In data centers where power represents the dominant operational cost, this total cost of ownership differential can render older hardware non-competitive far faster than the depreciation schedule assumes. You are not just paying for the chip. You are paying for the electricity to keep a less efficient chip running when a more efficient one exists.

Problem Three: The failure rate accelerates. Meta’s Llama 3 training study documented 148 GPU failures out of 419 total disruptions across 16,384 H100 GPUs over fifty-four days, an annualized failure rate of approximately nine percent, nearly double that of standard CPUs. A Google architect has stated on the record that data center GPU service life of one to three years is expected under high utilization. You can extend the life by reducing utilization, but reducing utilization means the chip depreciates slower and returns capital slower, which defeats the purpose of running it.

Problem Four: The depreciation is front-loaded in reality but straight-lined on paper. Hyperscalers use straight-line depreciation, spreading the cost evenly over five or six years. But the actual decline in economic value is heavily front-loaded. An H100 at peak scarcity in 2023 sold for $80,000 to $120,000 on the secondary market. By early 2026, it trades for $18,000 to $22,000. The chip lost the vast majority of its market value in the first two to three years, then the decline flattened. Straight-line depreciation over six years does not represent this reality. Accelerated depreciation methods would be more honest, but they would also be more painful for reported earnings.

Problem Five: The circular financing problem. The Princeton CITP has documented the circular financing structures underlying much of the AI hardware buildout. NVIDIA invests in CoreWeave. CoreWeave buys GPUs from NVIDIA. CoreWeave signs $22.4 billion in contracts with OpenAI. NVIDIA participates in OpenAI’s funding round. The money flows in a loop, with each transaction appearing as legitimate revenue or investment depending on which books you examine. This pattern mirrors the vendor financing structures that preceded the 2000-2003 telecom collapse, where Lucent, Nortel, and Cisco lent billions to customers who then used the money to buy equipment from the same companies. When the music stopped, forty-seven competitive telecom carriers went bankrupt and Lucent wrote off $3.5 billion.

The Structural Argument: This Is the Inverted Bubble in Hardware Form

I have argued previously in these pages that the AI infrastructure buildout constitutes a structurally novel bubble, what I called the Inverted AI Bubble, in which the normal mechanisms for correcting overinvestment are disabled by a set of interlocking competitive traps.

The chip lifecycle data provides the physical substrate of that argument. This is the core structural problem:

The hardware that AI infrastructure is built on depreciates faster than the financing used to acquire it matures.

A chip with an eighteen-month frontier life and a three-to-five-year economic life is being financed over five-to-six-year depreciation schedules, backed by debt instruments with comparable or longer terms. GPU-backed lending provides 70-80% loan-to-value with the GPUs themselves serving as collateral. But that collateral is losing 50-70% of its value over three years. The asset-backed financing structures that fund AI infrastructure are secured against a rapidly depreciating asset class while reporting the depreciation on a schedule that makes the math appear to work.

This is not fraud. It is worse than fraud. It is a structural misalignment between the physical reality of the hardware and the financial instruments built on top of it, and unlike fraud, there is no single actor whose arrest will fix it. The misalignment is embedded in GAAP-compliant accounting, in lending structures that every rational actor has incentives to maintain, and in competitive dynamics that punish any individual company that breaks rank and adopts more conservative schedules.

The Value Cascade Becomes the Liability Cascade

When the cascaded hardware eventually reaches the end of its useful life, it does not disappear. It enters what might be called the Liability Cascade:

Tier 1: Hyperscaler Decommission. Azure, AWS, and Google run the chips until the power-to-performance ratio turns negative. The hardware is removed from production.

Tier 2: Bargain Cloud Resale. Companies like RunPod, Lambda Labs, and regional providers buy decommissioned racks. They sell “last-gen” compute to students, startups, and researchers who need cheap VRAM, not peak reasoning speed. H100 cloud rental prices of $2-$4 per hour represent this tier in action.

Tier 3: Grey Market and Salvage. In hubs like Shenzhen and Hong Kong, technicians strip dead boards, de-solder HBM and GPU dies, and create what the industry calls “Franken-chips,” functional GPU dies soldered onto custom PCBs, sometimes with configurations NVIDIA never authorized. This is a multibillion-dollar silicon salvage industry operating largely outside official channels.

Tier 4: ITAD and E-Waste. When a chip is truly beyond repair, IT Asset Disposition companies shred the boards for data destruction. The remains go to specialized smelters that recover copper, aluminum, and trace amounts of gold and silver. Because these chips are dense with exotic materials and high-performance cooling fluids, they are significantly harder and more expensive to recycle than standard e-waste.

Each tier in this cascade extracts diminishing economic value while accumulating environmental liability. The hyperscalers who purchased the hardware bear the depreciation cost. The secondary market absorbs some of that cost through resale. The salvage market absorbs a fraction more. And the remainder, the gap between what was paid and what was recovered, is the true cost of the AI infrastructure buildout.

What This Means for Sovereign Infrastructure

If you are a government considering building sovereign AI infrastructure (and as I have documented extensively, Canada should be and is attempting) the chip lifecycle question is not academic. It is the single most important variable in your total cost of ownership calculation.

A sovereign AI stack built on six-year depreciation assumptions will appear affordable on paper. The same stack priced against three-year economic reality will cost nearly twice as much over a decade, because you will need to replace the hardware at least once during the accounting life of the first purchase.

This does not mean sovereign AI infrastructure is a bad investment. It means that any sovereign AI strategy that does not explicitly account for the accelerating depreciation cycle of AI hardware is built on financial sand.

The true math requires acknowledging that you are not buying infrastructure in the traditional sense, a bridge that will last fifty years, or a road that will last twenty. You are buying a consumable. An extremely expensive consumable with an eighteen-month frontier life, a three-to-five-year economic life, and an environmental afterlife that someone will have to pay for.

The hyperscalers are building the sovereign infrastructure of nations on hardware with the economic half-life of a gallon of milk.

The question is not whether we should build it. The question is whether we can afford to keep pretending the milk will last forever.

Jen Evans is the founder of Pattern Pulse AI and B2BNN, the originator of Evans’ Law of model coherence degradation, and the Sovereign AI Maturity Model, and the author of the “Whose AI Runs the Government?” series on Canadian AI sovereignty. She is currently based in Siem Reap, Cambodia.

The Economic Suicide Schedule: Why AI Chips Die Twice

Two Deaths, One Chip

Defining the Terms: Three Lives of a Chip

A Brief History of How Chips Learned to Die Faster

The CPU Era: Predictable Decay

The GPU Pivot: The Lockstep Breaks

The Current Crisis: The $176 Billion Gap

Five Problems with the Value Cascade

The Structural Argument: This Is the Inverted Bubble in Hardware Form

The Value Cascade Becomes the Liability Cascade

What This Means for Sovereign Infrastructure

Featured

Canadian AI Sovereignty Paper 15: The Exposure CUSMA Cannot Reach

An AI Strategy Must Include Regulation

The Maturation and the Convergence: A Canadian AI Sovereignty Series Update

Canadian AI Sovereignty Paper 13: “Moving Forward” Is the Decision

The Metacognition Mirage: Why Google’s Position Paper Misreads the Agentic Problem