Self-hosted GPU infrastructure, the option that looks most expensive upfront, produces the best EBITDA profile. A single 8-GPU DGX H200 server generates over $5M in five-year savings versus equivalent cloud compute, and none of that CapEx hits EBITDA. PE-owned companies are already exploiting this arbitrage.
The economics are counterintuitive, though. An enterprise buying a $400K DGX H200 server depreciates it over 3-6 years. That depreciation is excluded from EBITDA. The same enterprise renting equivalent compute from AWS at $39.80/hr for eight H200 GPUs books $348K annually as operating expense, reducing EBITDA dollar-for-dollar. At high utilization, the self-hosted path breaks even in under four months and delivers an 8x cost advantage per million tokens over cloud IaaS. The cloud-first narrative persists more from convenience than from sound economic analysis.
"The era of cloud-first for all AI workloads is over. The Total Cost of Ownership analysis decisively favors on-premises infrastructure for sustained inference and fine-tuning workloads."Lenovo, On-Premise vs Cloud: Generative AI Total Cost of Ownership, 2026 Edition
Enterprises building AI competencies face three infrastructure strategies, each with fundamentally different financial profiles. Path 1: Self-hosted compute means purchasing NVIDIA DGX systems, Cerebras CS-3 wafer-scale clusters, or SambaNova SambaRacks, then building or leasing data center space to run them. The upfront cost is steep: $515K for a DGX B200, ~$2M+ for a Cerebras CS-3 system. The payoff is control, performance density, and EBITDA-favorable accounting. Purpose-built inference silicon from Cerebras and SambaNova is redefining this path: the CS-3 delivers 2,100 tokens/sec on Llama 3.1 70B, and SambaNova's SN50 claims 895 tok/s per user on the same model versus 184 tok/s on NVIDIA's B200. Self-hosted no longer means NVIDIA-only.
Path 2: NeoCloud and hyperscaler services covers the managed compute layer. CoreWeave, Lambda Labs, AWS SageMaker, Google Vertex AI, and Azure AI offer GPU-hour rentals from $1.49 to $6.25/hr per GPU depending on provider and generation. This is pure OpEx: predictable monthly spend, zero hardware risk, and elastic scaling. It also compresses EBITDA at every dollar spent.
Path 3: API-only via frontier labs means calling Anthropic, OpenAI, Together.AI, or Fireworks for inference. Token prices are in freefall. OpenAI's trajectory is the clearest illustration: GPT-4 launched at $36/MTok in March 2023; GPT-4o dropped to $2.50/$10; GPT-5 nano now runs at $0.05/$0.40. That is a 99% price decline in three years. Anthropic's Claude Opus 4.5 costs $5/$25 per MTok, 67% cheaper than its predecessor. OpenAI alone serves 1M+ paying companies and 92% of the Fortune 500, generating $20B in ARR for 2025 with a $29.4B target for 2026. The market is enormous, but no individual customer holds any moat.
| Metric | Self-Hosted | NeoCloud / Hyperscaler | API-Only | Conf. |
|---|---|---|---|---|
| Unit cost (H100 equiv.) | $25K-$40K per GPU (purchase) | $1.49-$4.50/GPU-hr (rental) | $0.05-$25/MTok (inference) | High |
| 8-GPU server cost | $216K (DGX H100), $515K (DGX B200) | $39.80/hr (AWS p5e.48xl, 8xH200) | N/A | High |
| Break-even vs cloud | Under 4 months at high utilization | Baseline | Never (variable cost) | Med |
| 5-year TCO savings/server | Over $5M vs cloud IaaS | Baseline | Depends on volume | Med |
| Cost per MTok advantage | 8x vs cloud IaaS; 18x vs API | 2-3x vs API | Baseline | Med |
| EBITDA treatment | CapEx: excluded from EBITDA | OpEx: reduces EBITDA 1:1 | OpEx: reduces EBITDA 1:1 | High |
| Facility lease treatment | Operating lease: reduces EBITDA; finance lease or owned: below EBITDA | Bundled in hourly rate (OpEx) | N/A | High |
| GPU depreciation rate | 30-40% economic loss in year 1 | Provider absorbs depreciation | N/A | High |
| Accounting useful life | 3-5 yr (enterprise); 6 yr (hyperscaler) | N/A (rental) | N/A | High |
| Power per GPU (H100 SXM) | 700W under load | Bundled in hourly rate | N/A | High |
| Token price YoY decline | N/A (hardware cost) | Cloud GPU rates fell ~60% (2023-2025) | GPT-4 $36/MTok (2023) to GPT-5 nano $0.05 (2026): 99% decline; Opus 4.5: 67% cheaper than predecessor | High |
For PE-owned companies, EBITDA is the valuation metric. Enterprise value is typically 8-15x EBITDA in technology acquisitions. Every dollar of cloud compute spend reduces EBITDA by one dollar and enterprise value by $8-$15, whereas GPU CapEx is invisible to EBITDA entirely.
Consider a PE-backed SaaS company running 64 H200 GPUs for production inference. The cloud path: 8 AWS p5e.48xlarge instances at $39.80/hr costs $2.79M annually. That is $2.79M subtracted from EBITDA, and at a 12x multiple, $33.5M subtracted from enterprise value. The self-hosted path: 8 DGX H200 servers at $400K each costs $3.2M in year one. EBITDA impact: zero. The hardware depreciates at $640K/year over five years, but depreciation is excluded from EBITDA by definition.
By year two, the self-hosted path has paid back its cost and begins generating pure savings. By year five, cumulative savings exceed $5M per server. The PE firm that buys GPUs, not cloud hours, can claim a higher EBITDA at exit. This is standard financial engineering in PE-backed technology companies, not a theoretical exercise.
The EBITDA case for self-hosted compute assumes the hardware is the dominant cost, but the facility that houses it introduces a complication that the standard analysis tends to elide. Under US GAAP (ASC 842), a conventional colocation or data center lease is classified as an operating lease, and the resulting lease expense sits above EBITDA in the income statement. A company that buys $3.2M in DGX servers and leases a colocation hall at $80K/month has shielded the GPU spend from EBITDA while adding $960K in annual facility OpEx that hits it directly. The arbitrage still favors ownership, but the facility lease narrows the margin.
The accounting distinction between operating and finance leases matters more than most infrastructure analyses acknowledge. Under ASC 842, an operating lease produces a single, straight-line lease cost classified as an operating expense; a finance lease splits the cost into amortization of the right-of-use asset and interest on the lease liability, both of which fall below EBITDA in the same way that GPU depreciation does. EY's ASC 842 implementation guidance confirms this asymmetry: operating leases reduce EBITDA dollar-for-dollar, while finance leases preserve it. The practical implication is that structuring a facility arrangement to qualify as a finance lease — through transfer-of-ownership provisions, bargain purchase options, or lease terms exceeding 75% of the asset's economic life — can recover much of the EBITDA benefit that a standard colocation contract surrenders.
Several structuring levers are available to enterprises willing to look beyond turnkey managed services. First, maximizing owned in-cage infrastructure — power distribution units, structured cabling, cooling equipment, rack enclosures — shifts spend from recurring service fees into capitalizable assets. Second, avoiding heavily bundled contracts where power, cooling, and operations are wrapped into a single opaque monthly charge prevents the capitalization opportunity from being obscured inside an undifferentiated OpEx line. Third, leasehold improvements that the tenant funds directly (electrical buildout, cage construction, fire suppression upgrades) can be capitalized and amortized over the shorter of their useful life or the lease term under both US GAAP and IFRS, preserving a portion of the EBITDA shield. The strongest EBITDA outcome is not merely self-hosting compute but maximizing the share of total AI infrastructure spend that is owned, capitalized, and depreciated below the EBITDA line.
Under IFRS 16, nearly all leases are treated as finance leases, with depreciation of the right-of-use asset and interest expense both excluded from EBITDA. A multinational reporting under IFRS will show a more favorable EBITDA profile for the same facility arrangement than a US GAAP reporter, not because the economics differ but because the accounting presentation does. For PE-backed companies evaluating cross-border infrastructure, the choice of reporting framework can shift the EBITDA calculus by hundreds of thousands of dollars annually on a single facility lease.
The risk is real, though. GPU hardware depreciates fast. Prior-generation flagships (V100, A100, H100) lost 40-60% of their value within 18-24 months of their successor's launch. The NVIDIA B200, now shipping at $30K-$40K per GPU, makes H100 clusters a depreciating asset. A company that bought H100s in 2024 at $35K/GPU now holds hardware worth $15K-$20K while B200s deliver 2-3x the inference throughput. Self-hosting works financially only if the enterprise can sustain high utilization (above 60%) and tolerate generational obsolescence.
CoreWeave represents the NeoCloud thesis at maximum leverage. The company IPO'd in March 2025 at $40/share, peaked at $183.58 in June, and trades at $89 as of February 2026. Revenue hit $3.52B by mid-2025, with full-year 2025 projected at $8B. The backlog is extraordinary: a $17.4-$19.4B Microsoft deal over five years and a $3B Meta contract for Llama training.
The debt is equally extraordinary. CoreWeave has raised over $25B in capital since 2023, mostly debt, at a 4.8x debt-to-equity ratio unprecedented in technology. Interest expense tripled year-over-year to $311M in Q3 2025. The company faces $4.2B in maturities in 2026 that must be refinanced, likely at rates above the 11% average on its $7.6B DDTL 2.0 facility.
For enterprise customers, NeoCloud providers offer a middle path: GPU access without hardware ownership, at rates 30-50% below AWS and GCP on-demand pricing. Lambda Labs charges $2.49/hr for H100 PCIe instances versus AWS at $3-$4/hr. But AWS raised GPU prices 15% in January 2026, and managed ML services (SageMaker, Vertex AI) add 10-30% on top of base compute. The hyperscaler premium is widening just as NeoCloud alternatives proliferate.
The risk for enterprises relying on NeoClouds is concentration. CoreWeave's GPU fleet is collateral for its debt facilities. A demand slowdown or GPU depreciation cycle could trigger covenant violations, capacity reductions, or service disruptions. Lambda Labs is pre-IPO with an H1 2026 target and a $500M revenue run rate, but its largest customer is NVIDIA itself (a $1.5B GPU leaseback deal). NeoCloud infrastructure is structurally fragile in ways that hyperscalers are not.
A parallel disruption is reshaping the self-hosted calculus. Cerebras and SambaNova have built purpose-designed inference processors that outperform NVIDIA GPUs on throughput per watt, latency, and tokens-per-second by wide margins. Far from research curiosities, these are production systems with major enterprise and government customers.
Cerebras raised $1B at a $23B valuation in early 2026, with a Q2 2026 IPO target. Revenue grew from under $6M in Q2 2024 to $70M in Q2 2025. The $10B OpenAI inference deal (750MW through 2028) validates the architecture at hyperscale. The CS-3 wafer-scale engine delivers 2,100 tokens/sec on Llama 3.1 70B and 2,522 tok/s on Llama 4 Maverick. Enterprise customers deploying on-prem include GSK, Mayo Clinic, the U.S. Department of Energy, and the U.S. Department of Defense. Critically, OpenAI's gpt-oss-120B runs on Cerebras under an Apache 2.0 license, meaning enterprises can buy CS-3 systems and run frontier-class open-weight models on their own iron, fully air-gapped if needed.
SambaNova just unveiled the SN50 RDU (fifth generation), shipping H2 2026. It claims 5x the max speed and 3x the throughput of NVIDIA's B200 for agentic inference workloads, at 20kW per rack (air-cooled, no liquid cooling required). The SN50's three-tier memory architecture supports models exceeding 10 trillion parameters with context lengths above 10 million tokens. Revenue run rate hit $150M in 2025, with $250M targeted for 2026. A $350M+ Series E from Vista Equity Partners and Intel closed in February 2026, after Intel's $1.6B acquisition bid stalled. SambaNova sells SambaRack hardware for on-prem deployment and offers SambaManaged, a turnkey AI cloud deployable in customer data centers in 90 days.
For the EBITDA analysis, inference silicon matters because it amplifies the self-hosted advantage. A Cerebras CS-3 or SambaNova SN50 rack delivering 5-10x the inference throughput of equivalent NVIDIA hardware means the same CapEx buys proportionally more compute. The break-even period shortens and the cost-per-token advantage widens accordingly. And the hardware, purchased as CapEx, remains invisible to EBITDA. The NVIDIA monoculture is ending; the self-hosted path now has competitive silicon options that reduce both cost and vendor concentration risk.
| Dimension | Self-Hosted GPU | API-Only (Frontier Labs) |
|---|---|---|
| Upfront cost | $216K-$515K per 8-GPU server; $30K-$100K infra per node | Zero |
| Ongoing cost | Power ($50K-$80K/yr per server), support ($20K-$50K/yr), staff | $0.05-$25 per million tokens; scales linearly with usage |
| EBITDA impact | CapEx excluded from EBITDA; only maintenance OpEx hits | 100% OpEx; every dollar reduces EBITDA |
| Facility cost | Colo lease is OpEx unless structured as finance lease; owned in-cage infra and leasehold improvements can be capitalized | Bundled in hourly rate; no separate facility line |
| Scaling speed | Weeks to months (procurement, rack, configure) | Seconds (API call) |
| Vendor lock-in | NVIDIA hardware ecosystem; CUDA dependency | API abstraction layers emerging; model-switching possible |
| Stranded asset risk | High: GPUs lose 40-60% value within 18-24 months of successor | None: no owned assets |
| Data sovereignty | Full: data never leaves premises | Limited: data transits to third-party inference endpoints |
| Model flexibility | Run any model, any size, any configuration | Limited to provider's model catalog and parameters |
| Competitive moat | Fine-tuned models, proprietary data pipelines, custom inference | Minimal: competitors access identical models at identical prices |
| Trade-off | Maximum financial and strategic control at maximum operational complexity and hardware risk | Maximum simplicity and speed at maximum EBITDA compression and zero differentiation |
Does each path solve the problem of building durable AI competencies with an optimal EBITDA profile?
Best EBITDA profile by a wide margin, provided the facility strategy is structured deliberately. GPU CapEx exclusion from EBITDA creates $2-3M in annual enterprise value uplift per 8-GPU server at typical PE multiples, and Lenovo's 2026 TCO study confirms an 8x cost advantage per million tokens versus cloud IaaS. Purpose-built inference silicon (Cerebras CS-3, SambaNova SN50) amplifies the advantage with 5-10x throughput gains over equivalent NVIDIA hardware. The caveats: the benefit requires 60%+ utilization, in-house ML ops talent, and tolerance for 3-4 year hardware refresh cycles. Facility leases classified as operating leases under ASC 842 introduce OpEx that partially offsets the GPU CapEx shield; enterprises can mitigate this by maximizing owned in-cage infrastructure, capitalizing leasehold improvements, and structuring arrangements that qualify as finance leases where commercially feasible.
Preserves flexibility and eliminates hardware risk, but every dollar hits EBITDA. NeoCloud providers (CoreWeave at $2.49-$4.25/hr, Lambda at $2.49/hr) offer 30-50% savings versus hyperscaler on-demand pricing, but still book as OpEx. AWS's January 2026 price hike (15%) and managed service premiums (10-30%) are widening the gap. NeoCloud counterparty risk (CoreWeave's $4.2B in 2026 debt maturities) is a non-trivial concern for enterprises signing multi-year commitments.
Token price deflation is dramatic: OpenAI's pricing has fallen 99% in three years (GPT-4 at $36/MTok to GPT-5 nano at $0.05), and Anthropic's Opus 4.5 is 67% cheaper than its predecessor. But deflation does not solve either problem. Every API call is pure OpEx compressing EBITDA. Worse, API-only enterprises build zero proprietary infrastructure. OpenAI serves 92% of the Fortune 500 and 1M+ paying companies; your competitors access identical models at identical prices. The path works for experimentation and low-volume production, but fails as a durable competitive strategy. Organizations spending $500K+/year on API calls should evaluate self-hosting.