Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

Key Takeaways

Nvidia's GB200 chips offer significant performance gains but present new reliability challenges for cloud providers.
China is actively developing domestic chip capabilities and HBM technology, facing US bans and manufacturing bottlenecks.
Nvidia projects massive AI infrastructure spending, with hyperscalers like Oracle committing to billions in compute resources.
CEO Jensen Huang's risk-taking and strategic vision are key drivers behind Nvidia's dominant market position.
Nvidia's manufacturing agility allows for rapid chip development, often shipping A0 revisions first.
AI inference workloads are disaggregated into pre-fill and decode to optimize GPU utilization and user experience.

Deep Dive

The GB200's Total Cost of Ownership (TCO) is estimated at 1.6 times that of the H100, requiring significant performance increases to justify adoption.
Performance gains for the GB200 over the H100 range from 2x for general pre-training to over 6x for specific deep seek inference tasks using NVLink NVL72.
The 72-GPU architecture of the GB200 raises reliability concerns compared to the more stable 8-GPU B200, impacting cloud provider uptime Service Level Agreements (SLAs).
Cloud providers are adjusting GB200 SLAs, offering lower uptime guarantees for the full 72-GPU configuration, affecting customer cost-performance benefits.
Removing High Bandwidth Memory (HBM) from compute-optimized chips can halve the GPU cost, facilitating broader adoption of long-context models.
Nvidia also pre-announced Rubin CPX GPUs, raising questions about potential market cannibalization and strategic timing.

Huawei released 7-nanometer Ascend AI chips in 2020 before full foreign supply chain access was banned, demonstrating early domestic capability.
A 2020 Trump administration ban prevented Huawei from accessing TSMC, leading to domestic manufacturing at SMIC and acquisition of approximately 3 million chips through shell companies before practices ceased.
The 2025 ban on Nvidia's H20 chip led Nvidia to write off significant revenue and cut its supply chain to China, although some inventory was resold.
China is exploring domestic alternatives to Nvidia, with companies like Camericon and Huawei possessing capacity, but still relying on foreign components like wafers and memory.
Huawei's recent announcement of custom High Bandwidth Memory (HBM) for upcoming chips signifies China's progress in aligning with global trends like those from Nvidia and AMD.
Strategic announcements regarding China's domestic supply chain, such as Huawei's advancements, may serve as a negotiation tactic to gain access to more advanced AI chips from companies like Nvidia.

Bank consensus estimates for next-year CapEx for the six hyperscalers (Microsoft, CoreWeave, Amazon, Google, Oracle, Meta) are $360 billion, though a more conservative estimate is $45-50 billion primarily from Nvidia.
AI labs like OpenAI and Anthropic are projected to have billions in annual burn rates, potentially remaining unprofitable until 2029.
Nvidia is poised to capture a large portion of the projected trillions in AI infrastructure investment due to the increasing productivity of knowledge workers.
Predicting Nvidia's long-term market capitalization beyond five years is difficult due to rapid technological shifts including BCIs and humanoid robots.

Nvidia founder Jensen Huang built the company's moat by betting heavily on unproven technologies, such as significant volume orders for Xbox chips and during crypto booms.
Nvidia managed perceptions during crypto bubbles by framing demand as durable gaming and data center needs to encourage supply chain partners to increase production.
Huang's aggressive non-cancelable capacity ordering strategy, driven by gut instinct, often exceeds stated customer needs, prioritizing next-generation advancements over predictable quarterly earnings.
Despite past missteps, like in mobile, Huang's philosophy of 'winning the current game to enable playing the next one' is credited for Nvidia's success as a high-value semiconductor company.
Huang's leadership has evolved over 30 years, with increased charisma and a 'rock star' persona, but his bold decision-making is still informed by past risks.
Huang's consistent accuracy regarding AI's future, despite initial audience confusion at 2014/2015 CES presentations, has placed him among elite CEOs.

Nvidia's early financial struggles included the near-failure of their first successful chip due to high mask set costs, preventing revisions after initial manufacturing.
Nvidia consistently achieves 'A0 revision' chip designs, a rare feat compared to competitors like AMD and Broadcom, who often require multiple iterations.
Their manufacturing strategy involves shipping A0 silicon and quickly ramping production by delaying the transition to metal layers, avoiding delays faced by competitors.
Intel's chip development, in contrast, faced numerous revisions (e.g., E2 or 15 stepping), causing significant delays.
Nvidia's rapid production cycle, supported by advanced simulation and verification, enabled last-minute additions like TensorCores to the Volta chip.
Nvidia's success is attributed to its speed and execution in capitalizing on opportunities across gaming, VR, crypto mining, and AI.

A key challenge for Nvidia's future involves deploying its massive cash flow, given regulatory constraints on large acquisitions, with potential avenues including building AI infrastructure or investing in robotics.
Nvidia has made smaller, strategic investments in its supply chain and recognizes the difficulty startups face in accessing large GPU clusters for training.
The company is exploring ways to provide burst capacity and reduce the time and cost associated with model development for startups.
Nvidia's investments in companies like CoreWeave, OpenAI, and XAI are relatively small, potentially involving favorable terms like renting compute clusters from these companies.
This investment strategy avoids significant capital outlay and potential antitrust concerns, allowing Nvidia to reshape its market without substantial capital.

A past challenge for Intel was its customer base being heavily concentrated with large hyperscalers, who were also developing their own chips, leading to downward price pressure.
Amazon's infrastructure, optimized for previous computing eras, was not suited for scale-up AI, leading to decelerating AWS revenue growth in 2023.
Dylan Patel's prediction of Amazon's cloud issues proved correct, with Amazon underperforming other hyperscalers, though a re-acceleration of AWS revenue growth to over 20% is now expected due to massive data center deployments.
Amazon has historically focused on high-density data centers, even in humid conditions, to optimize costs, and possesses substantial capacity for AI with necessary infrastructure modifications.
While Amazon's internal AI models and hardware (Tranium, TPUs) may not be superior to Nvidia's, their ability to build and fill data centers represents a straightforward revenue-generating strategy.

Oracle is highlighted for its strong balance sheet, flexible hardware and networking approach (Ethernet and InfiniBand), and software capabilities like ClusterMax.
Oracle is strategically positioned to capitalize on OpenAI's significant compute demand, with Microsoft reportedly hesitant to fully meet it.
Analysts track data center capacity and power availability using supply chains, permits, and satellite imagery, predicting Oracle's significant expansion plans potentially spanning to 2027 and beyond.
Detailed tracking allows for predictions of when sites like Stargate will come online and the associated rental costs for companies such as OpenAI.
Predictions for Oracle's revenue from 2025-2027 closely match announced figures, with expectations of further announcements regarding partnerships with OpenAI and ByteDance (TikTok).
Oracle may leverage debt markets to finance future GPU purchases for long-term contracts, a strategy previously employed by other cloud providers.

AI chips serve both training and inference workloads, with inference increasingly dominating due to reinforcement learning, involving pre-fill for KV cache calculation and decode for auto-regressive token generation.
Initially, a single batch size optimized GPU utilization for both pre-fill and decode, but this negatively impacted decode workers, resulting in slower tokens per second.
Major AI companies like OpenAI, Anthropic, and Google now disaggregate pre-fill and decode workloads onto separate GPU sets, allowing auto-scaling based on traffic mix and prioritizing faster time-to-first-token.
Decode requires loading all parameters and KV caches for each token with limited batching, while pre-fill involves significant computation for long context requests, making raw computational power (flops) critical.
The complexity of these distinct workloads makes it challenging for many to grasp concepts like Compute Processing Unit (CPX).

Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

Key Takeaways

Deep Dive

More from a16z Podcast

Listen smarter with PodBrief