Research Brief

AI Infrastructure Economics

Which AI infrastructure path delivers the best EBITDA profile for enterprises building durable AI competencies?

Brandon Huey · March 2026

Self-hosted GPU infrastructure, the option that looks most expensive upfront, produces the best EBITDA profile. A single 8-GPU DGX H200 server generates over $5M in five-year savings versus equivalent cloud compute, and none of that CapEx hits EBITDA. PE-owned companies are already exploiting this arbitrage.

The economics are counterintuitive, though. An enterprise buying a $400K DGX H200 server depreciates it over 3-6 years. That depreciation is excluded from EBITDA. The same enterprise renting equivalent compute from AWS at $39.80/hr for eight H200 GPUs books $348K annually as operating expense, reducing EBITDA dollar-for-dollar. At high utilization, the self-hosted path breaks even in under four months and delivers an 8x cost advantage per million tokens over cloud IaaS. The cloud-first narrative persists more from convenience than from sound economic analysis.

"The era of cloud-first for all AI workloads is over. The Total Cost of Ownership analysis decisively favors on-premises infrastructure for sustained inference and fine-tuning workloads."

Lenovo, On-Premise vs Cloud: Generative AI Total Cost of Ownership, 2026 Edition

The Three Paths

Enterprises building AI competencies face three infrastructure strategies, each with fundamentally different financial profiles. Path 1: Self-hosted compute means purchasing NVIDIA DGX systems, Cerebras CS-3 wafer-scale clusters, or SambaNova SambaRacks, then building or leasing data center space to run them. The upfront cost is steep: $515K for a DGX B200, ~$2M+ for a Cerebras CS-3 system. The payoff is control, performance density, and EBITDA-favorable accounting. Purpose-built inference silicon from Cerebras and SambaNova is redefining this path: the CS-3 delivers 2,100 tokens/sec on Llama 3.1 70B, and SambaNova's SN50 claims 895 tok/s per user on the same model versus 184 tok/s on NVIDIA's B200. Self-hosted no longer means NVIDIA-only.

Path 2: NeoCloud and hyperscaler services covers the managed compute layer. CoreWeave, Lambda Labs, AWS SageMaker, Google Vertex AI, and Azure AI offer GPU-hour rentals from $1.49 to $6.25/hr per GPU depending on provider and generation. This is pure OpEx: predictable monthly spend, zero hardware risk, and elastic scaling. It also compresses EBITDA at every dollar spent.

Path 3: API-only via frontier labs means calling Anthropic, OpenAI, Together.AI, or Fireworks for inference. Token prices are in freefall. OpenAI's trajectory is the clearest illustration: GPT-4 launched at $36/MTok in March 2023; GPT-4o dropped to $2.50/$10; GPT-5 nano now runs at $0.05/$0.40. That is a 99% price decline in three years. Anthropic's Claude Opus 4.5 costs $5/$25 per MTok, 67% cheaper than its predecessor. OpenAI alone serves 1M+ paying companies and 92% of the Fortune 500, generating $20B in ARR for 2025 with a $29.4B target for 2026. The market is enormous, but no individual customer holds any moat.

Evidence Table: Infrastructure Cost Benchmarks (2026)

Metric	Self-Hosted	NeoCloud / Hyperscaler	API-Only	Conf.
Unit cost (H100 equiv.)	$25K-$40K per GPU (purchase)	$1.49-$4.50/GPU-hr (rental)	$0.05-$25/MTok (inference)	High
8-GPU server cost	$216K (DGX H100), $515K (DGX B200)	$39.80/hr (AWS p5e.48xl, 8xH200)	N/A	High
Break-even vs cloud	Under 4 months at high utilization	Baseline	Never (variable cost)	Med
5-year TCO savings/server	Over $5M vs cloud IaaS	Baseline	Depends on volume	Med
Cost per MTok advantage	8x vs cloud IaaS; 18x vs API	2-3x vs API	Baseline	Med
EBITDA treatment	CapEx: excluded from EBITDA	OpEx: reduces EBITDA 1:1	OpEx: reduces EBITDA 1:1	High
Facility lease treatment	Operating lease: reduces EBITDA; finance lease or owned: below EBITDA	Bundled in hourly rate (OpEx)	N/A	High
GPU depreciation rate	30-40% economic loss in year 1	Provider absorbs depreciation	N/A	High
Accounting useful life	3-5 yr (enterprise); 6 yr (hyperscaler)	N/A (rental)	N/A	High
Power per GPU (H100 SXM)	700W under load	Bundled in hourly rate	N/A	High
Token price YoY decline	N/A (hardware cost)	Cloud GPU rates fell ~60% (2023-2025)	GPT-4 $36/MTok (2023) to GPT-5 nano $0.05 (2026): 99% decline; Opus 4.5: 67% cheaper than predecessor	High

The EBITDA Arbitrage

For PE-owned companies, EBITDA is the valuation metric. Enterprise value is typically 8-15x EBITDA in technology acquisitions. Every dollar of cloud compute spend reduces EBITDA by one dollar and enterprise value by $8-$15, whereas GPU CapEx is invisible to EBITDA entirely.

Consider a PE-backed SaaS company running 64 H200 GPUs for production inference. The cloud path: 8 AWS p5e.48xlarge instances at $39.80/hr costs $2.79M annually. That is $2.79M subtracted from EBITDA, and at a 12x multiple, $33.5M subtracted from enterprise value. The self-hosted path: 8 DGX H200 servers at $400K each costs $3.2M in year one. EBITDA impact: zero. The hardware depreciates at $640K/year over five years, but depreciation is excluded from EBITDA by definition.

By year two, the self-hosted path has paid back its cost and begins generating pure savings. By year five, cumulative savings exceed $5M per server. The PE firm that buys GPUs, not cloud hours, can claim a higher EBITDA at exit. This is standard financial engineering in PE-backed technology companies, not a theoretical exercise.

The Facility Cost Drag

The EBITDA case for self-hosted compute assumes the hardware is the dominant cost, but the facility that houses it introduces a complication that the standard analysis tends to elide. Under US GAAP (ASC 842), a conventional colocation or data center lease is classified as an operating lease, and the resulting lease expense sits above EBITDA in the income statement. A company that buys $3.2M in DGX servers and leases a colocation hall at $80K/month has shielded the GPU spend from EBITDA while adding $960K in annual facility OpEx that hits it directly. The arbitrage still favors ownership, but the facility lease narrows the margin.

The accounting distinction between operating and finance leases matters more than most infrastructure analyses acknowledge. Under ASC 842, an operating lease produces a single, straight-line lease cost classified as an operating expense; a finance lease splits the cost into amortization of the right-of-use asset and interest on the lease liability, both of which fall below EBITDA in the same way that GPU depreciation does. EY's ASC 842 implementation guidance confirms this asymmetry: operating leases reduce EBITDA dollar-for-dollar, while finance leases preserve it. The practical implication is that structuring a facility arrangement to qualify as a finance lease — through transfer-of-ownership provisions, bargain purchase options, or lease terms exceeding 75% of the asset's economic life — can recover much of the EBITDA benefit that a standard colocation contract surrenders.

Several structuring levers are available to enterprises willing to look beyond turnkey managed services. First, maximizing owned in-cage infrastructure — power distribution units, structured cabling, cooling equipment, rack enclosures — shifts spend from recurring service fees into capitalizable assets. Second, avoiding heavily bundled contracts where power, cooling, and operations are wrapped into a single opaque monthly charge prevents the capitalization opportunity from being obscured inside an undifferentiated OpEx line. Third, leasehold improvements that the tenant funds directly (electrical buildout, cage construction, fire suppression upgrades) can be capitalized and amortized over the shorter of their useful life or the lease term under both US GAAP and IFRS, preserving a portion of the EBITDA shield. The strongest EBITDA outcome is not merely self-hosting compute but maximizing the share of total AI infrastructure spend that is owned, capitalized, and depreciated below the EBITDA line.

Under IFRS 16, nearly all leases are treated as finance leases, with depreciation of the right-of-use asset and interest expense both excluded from EBITDA. A multinational reporting under IFRS will show a more favorable EBITDA profile for the same facility arrangement than a US GAAP reporter, not because the economics differ but because the accounting presentation does. For PE-backed companies evaluating cross-border infrastructure, the choice of reporting framework can shift the EBITDA calculus by hundreds of thousands of dollars annually on a single facility lease.

The risk is real, though. GPU hardware depreciates fast. Prior-generation flagships (V100, A100, H100) lost 40-60% of their value within 18-24 months of their successor's launch. The NVIDIA B200, now shipping at $30K-$40K per GPU, makes H100 clusters a depreciating asset. A company that bought H100s in 2024 at $35K/GPU now holds hardware worth $15K-$20K while B200s deliver 2-3x the inference throughput. Self-hosting works financially only if the enterprise can sustain high utilization (above 60%) and tolerate generational obsolescence.

The NeoCloud Bet

CoreWeave represents the NeoCloud thesis at maximum leverage. The company IPO'd in March 2025 at $40/share, peaked at $183.58 in June, and trades at $89 as of February 2026. Revenue hit $3.52B by mid-2025, with full-year 2025 projected at $8B. The backlog is extraordinary: a $17.4-$19.4B Microsoft deal over five years and a $3B Meta contract for Llama training.

The debt is equally extraordinary. CoreWeave has raised over $25B in capital since 2023, mostly debt, at a 4.8x debt-to-equity ratio unprecedented in technology. Interest expense tripled year-over-year to $311M in Q3 2025. The company faces $4.2B in maturities in 2026 that must be refinanced, likely at rates above the 11% average on its $7.6B DDTL 2.0 facility.

For enterprise customers, NeoCloud providers offer a middle path: GPU access without hardware ownership, at rates 30-50% below AWS and GCP on-demand pricing. Lambda Labs charges $2.49/hr for H100 PCIe instances versus AWS at $3-$4/hr. But AWS raised GPU prices 15% in January 2026, and managed ML services (SageMaker, Vertex AI) add 10-30% on top of base compute. The hyperscaler premium is widening just as NeoCloud alternatives proliferate.

The risk for enterprises relying on NeoClouds is concentration. CoreWeave's GPU fleet is collateral for its debt facilities. A demand slowdown or GPU depreciation cycle could trigger covenant violations, capacity reductions, or service disruptions. Lambda Labs is pre-IPO with an H1 2026 target and a $500M revenue run rate, but its largest customer is NVIDIA itself (a $1.5B GPU leaseback deal). NeoCloud infrastructure is structurally fragile in ways that hyperscalers are not.

The Inference Silicon Insurgency

A parallel disruption is reshaping the self-hosted calculus. Cerebras and SambaNova have built purpose-designed inference processors that outperform NVIDIA GPUs on throughput per watt, latency, and tokens-per-second by wide margins. Far from research curiosities, these are production systems with major enterprise and government customers.

Cerebras raised $1B at a $23B valuation in early 2026, with a Q2 2026 IPO target. Revenue grew from under $6M in Q2 2024 to $70M in Q2 2025. The $10B OpenAI inference deal (750MW through 2028) validates the architecture at hyperscale. The CS-3 wafer-scale engine delivers 2,100 tokens/sec on Llama 3.1 70B and 2,522 tok/s on Llama 4 Maverick. Enterprise customers deploying on-prem include GSK, Mayo Clinic, the U.S. Department of Energy, and the U.S. Department of Defense. Critically, OpenAI's gpt-oss-120B runs on Cerebras under an Apache 2.0 license, meaning enterprises can buy CS-3 systems and run frontier-class open-weight models on their own iron, fully air-gapped if needed.

SambaNova just unveiled the SN50 RDU (fifth generation), shipping H2 2026. It claims 5x the max speed and 3x the throughput of NVIDIA's B200 for agentic inference workloads, at 20kW per rack (air-cooled, no liquid cooling required). The SN50's three-tier memory architecture supports models exceeding 10 trillion parameters with context lengths above 10 million tokens. Revenue run rate hit $150M in 2025, with $250M targeted for 2026. A $350M+ Series E from Vista Equity Partners and Intel closed in February 2026, after Intel's $1.6B acquisition bid stalled. SambaNova sells SambaRack hardware for on-prem deployment and offers SambaManaged, a turnkey AI cloud deployable in customer data centers in 90 days.

For the EBITDA analysis, inference silicon matters because it amplifies the self-hosted advantage. A Cerebras CS-3 or SambaNova SN50 rack delivering 5-10x the inference throughput of equivalent NVIDIA hardware means the same CapEx buys proportionally more compute. The break-even period shortens and the cost-per-token advantage widens accordingly. And the hardware, purchased as CapEx, remains invisible to EBITDA. The NVIDIA monoculture is ending; the self-hosted path now has competitive silicon options that reduce both cost and vendor concentration risk.

Decision Factor Impact: Time to Recoup or Lock-In Duration

Who is Already Committed

Entity Spotlight: Infrastructure Strategy Leaders

Dimension	Self-Hosted GPU	API-Only (Frontier Labs)
Upfront cost	$216K-$515K per 8-GPU server; $30K-$100K infra per node	Zero
Ongoing cost	Power ($50K-$80K/yr per server), support ($20K-$50K/yr), staff	$0.05-$25 per million tokens; scales linearly with usage
EBITDA impact	CapEx excluded from EBITDA; only maintenance OpEx hits	100% OpEx; every dollar reduces EBITDA
Facility cost	Colo lease is OpEx unless structured as finance lease; owned in-cage infra and leasehold improvements can be capitalized	Bundled in hourly rate; no separate facility line
Scaling speed	Weeks to months (procurement, rack, configure)	Seconds (API call)
Vendor lock-in	NVIDIA hardware ecosystem; CUDA dependency	API abstraction layers emerging; model-switching possible
Stranded asset risk	High: GPUs lose 40-60% value within 18-24 months of successor	None: no owned assets
Data sovereignty	Full: data never leaves premises	Limited: data transits to third-party inference endpoints
Model flexibility	Run any model, any size, any configuration	Limited to provider's model catalog and parameters
Competitive moat	Fine-tuned models, proprietary data pipelines, custom inference	Minimal: competitors access identical models at identical prices
Trade-off	Maximum financial and strategic control at maximum operational complexity and hardware risk	Maximum simplicity and speed at maximum EBITDA compression and zero differentiation

Assessment

Does each path solve the problem of building durable AI competencies with an optimal EBITDA profile?

Self-Hosted: Yes, with caveats

Best EBITDA profile by a wide margin, provided the facility strategy is structured deliberately. GPU CapEx exclusion from EBITDA creates $2-3M in annual enterprise value uplift per 8-GPU server at typical PE multiples, and Lenovo's 2026 TCO study confirms an 8x cost advantage per million tokens versus cloud IaaS. Purpose-built inference silicon (Cerebras CS-3, SambaNova SN50) amplifies the advantage with 5-10x throughput gains over equivalent NVIDIA hardware. The caveats: the benefit requires 60%+ utilization, in-house ML ops talent, and tolerance for 3-4 year hardware refresh cycles. Facility leases classified as operating leases under ASC 842 introduce OpEx that partially offsets the GPU CapEx shield; enterprises can mitigate this by maximizing owned in-cage infrastructure, capitalizing leasehold improvements, and structuring arrangements that qualify as finance leases where commercially feasible.

NeoCloud / Hyperscaler: Mixed

Preserves flexibility and eliminates hardware risk, but every dollar hits EBITDA. NeoCloud providers (CoreWeave at $2.49-$4.25/hr, Lambda at $2.49/hr) offer 30-50% savings versus hyperscaler on-demand pricing, but still book as OpEx. AWS's January 2026 price hike (15%) and managed service premiums (10-30%) are widening the gap. NeoCloud counterparty risk (CoreWeave's $4.2B in 2026 debt maturities) is a non-trivial concern for enterprises signing multi-year commitments.

API-Only: No, for most enterprises

Token price deflation is dramatic: OpenAI's pricing has fallen 99% in three years (GPT-4 at $36/MTok to GPT-5 nano at $0.05), and Anthropic's Opus 4.5 is 67% cheaper than its predecessor. But deflation does not solve either problem. Every API call is pure OpEx compressing EBITDA. Worse, API-only enterprises build zero proprietary infrastructure. OpenAI serves 92% of the Fortune 500 and 1M+ paying companies; your competitors access identical models at identical prices. The path works for experimentation and low-volume production, but fails as a durable competitive strategy. Organizations spending $500K+/year on API calls should evaluate self-hosting.

Key Terminology

EBITDA: Earnings Before Interest, Taxes, Depreciation, and Amortization; primary valuation metric for PE-owned companies

CapEx: Capital Expenditure; asset purchases depreciated over useful life, excluded from EBITDA

OpEx: Operating Expenditure; recurring costs that reduce EBITDA dollar-for-dollar

NeoCloud: GPU-specialized cloud providers (CoreWeave, Lambda Labs) competing with hyperscalers on AI compute

DGX: NVIDIA's pre-configured multi-GPU server platform; DGX H200 ($400K), DGX B200 ($515K)

MTok: Million tokens; standard unit for API inference pricing

TCO: Total Cost of Ownership; full lifecycle cost including hardware, power, cooling, staff, and maintenance

Stranded Asset: Hardware that loses economic value before the end of its accounting depreciation schedule

ASC 842: US GAAP lease accounting standard; operating leases reduce EBITDA, finance leases do not

IFRS 16: International lease accounting standard; treats nearly all leases as finance leases, producing more EBITDA-favorable presentation than US GAAP

GPU-Backed Lending: Debt facilities collateralized by GPU hardware; advance rates of 50-70% at 12-15% interest

Inference Repurposing: Shifting GPUs from training to inference workloads as newer hardware takes over training duties

Wafer-Scale Engine: Cerebras's approach of fabricating an entire AI processor on a single silicon wafer, eliminating inter-chip communication bottlenecks

RDU: Reconfigurable Dataflow Unit; SambaNova's processor architecture optimized for inference with three-tier memory hierarchy

Conclusions

Self-hosted compute is the optimal EBITDA play, but the advantage depends on how the entire stack — not just the GPUs — is structured. The CapEx-to-EBITDA exclusion creates a structural advantage worth $2-3M per server annually at typical technology multiples, and purpose-built inference silicon from Cerebras and SambaNova widens the margin further. Facility leases classified as operating leases under ASC 842 partially offset this benefit; the strongest outcome comes from maximizing the share of infrastructure spend that is owned, capitalized, and depreciated below the EBITDA line, while minimizing the share embedded in recurring facility and managed-service OpEx. Enterprises spending more than $500K/year on cloud compute should model the self-hosted alternative alongside the facility structuring options immediately.

NeoCloud providers are a viable bridge, not a destination. CoreWeave and Lambda offer 30-50% savings versus hyperscaler on-demand pricing, but their debt structures and customer concentration create counterparty risk. Enterprises should use NeoClouds for burst capacity and experimentation while building owned infrastructure for steady-state workloads.

API-only is a competency trap for enterprises above the experimentation phase. Token price deflation (67% YoY on frontier models) makes API access cheap, but it builds zero moat. Every competitor buys the same models at the same prices, which means companies relying solely on API calls are renting intelligence rather than building any lasting advantage.

Inaction carries more risk than a wrong choice. Global AI OpEx will exceed $500B in 2026, up 300% from 2024. Organizations spending 40-60% more than budgeted on AI infrastructure are the ones that started without a strategy. The enterprises that model EBITDA impact, depreciation schedules, and utilization thresholds before committing capital will outperform those that default to cloud convenience.

Sources: Lenovo 2026 TCO Whitepaper, NVIDIA GPU Pricing Guides (JarvisLabs, IntuitionLabs, GMI Cloud), CoreWeave SEC Filings and Investor Relations, Lambda Labs / Sacra Research, Cerebras Systems Investor Relations and Benchmark Data, SambaNova SN50 Product Announcement (Feb 2026), AWS SageMaker Pricing, Google Vertex AI Pricing, Anthropic API Pricing, OpenAI API Pricing and Partnership Announcements, Together AI Pricing, Fireworks AI Pricing, Introl AI Infrastructure Financing Report, SiliconAngle GPU Depreciation Analysis, The Register (AWS price increases), Futurum AI CapEx 2026, Deloitte AI Infrastructure Compute Strategy, EY ASC 842 Lease Accounting Implementation Guide